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Abstract 


In healthcare settings, there is often a lack of trust between the healthcare authority and the 
patient concerning the security of their medical records. Hospitals are notorious for having 
poor security concerning patient record management. A system where patients have auton- 


omy over their medical records and who can view them is a promising scenario. 


Hospitals and healthcare institutions have been the target of cyber attacks for years. Last 
year in Ireland, the Health Service Executive was targeted by a ransomware attack that saw 
520 patients’ medical records stolen. Solutions for storing medical records using blockchain 
technologies suffer from over-reliance on centralised cloud servers to store keys, privacy 
issues and allowing attackers to infer information about patients from their blockchain ac- 


tivity. 


This project provides a framework for indexing and securing a user’s records with empha- 
sis placed on the healthcare setting using an Ethereum blockchain. The records are secured 
using iris authentication and the patient’s personal identifiable information. The patient can 
grant and revoke access to their records to individual healthcare authorities, and Interplan- 
etary Name System is used for off-chain record storage. The framework is modular and can 
be adapted for use in other environments, such as proof of ownership of tickets and storing 


travel documents for verification by border control. 


A smart contract is used to store the hashes of the patient’s iris scans on an Ethereum Vir- 
tual Machine compatible blockchain. Privacy-preserving identifiers are used to anonymise 
the patient and where their records are stored on the blockchain. The cost and time to 
deanonymise the user were high and unfeasible. Iris matching accuracy for hashed irises 


was sufficient for the proposed use case. 
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CHAPTER l 


Introduction 


Hospitals are responsible for storing vasts amounts of sensitive patient information over long 
periods. One area in which the healthcare industry suffers greatly is security and outdated, 
flawed systems. Last year, the Health Service Executive (HSE) suffered a significant data 
breach and fell victim to a ransomware cyberattack. This crippled their systems, forced them 
to cancel some outpatient appointments, halted the reporting of COVID-19 cases in Ireland, 
and caused 520 patients’ medical records to be stolen. The full impact on public health due 
to the ransomware attack is not currently known, but it is expected that it had a significant 
impact on patient care. Public health was impacted in many ways. This includes cancelled 


radiotherapy treatment for cancer patients and consultant shortages [39]. 


Blockchain technology was thought to only serve as a financial instrument initially. How- 
ever, it became clear that it had many other real-world applications, especially in cases where 
transparency and decentralisation are essential. eHealth is an emerging healthcare practice 
with a broad scope ranging from telemedicine to anything involving computers in medicine. 
Applying blockchain technology to eHealth is a promising research area. Giving full trans- 
parency to patients in healthcare settings on how their medical records are stored and who 


has access could improve their confidence in the healthcare industry. 


An electronic health record (EHR) is a collection of health information about individual pa- 
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tients or populations. There are benefits to EHR over traditional paper-based health records, 
including being easily accessible, transportable, and resistant to physical damage. The US 
government incentivised widespread use of EHRs when the Health Information Technology 
for Economic and Clinical Health (HITECH) Act of 2009 was signed. Typically, healthcare 
systems using EHRs use custom in-house solutions. With the development of powerful and 
low-cost cloud computing, using in-house solutions is more unattractive due to setup, main- 
tenance, and associated costs. However, patients are unlikely to trust cloud service providers 


with storing their EHRs. 


An EHR storage system using blockchain technologies could serve as an alternative to cen- 
tralised cloud servers. Firstly, it does not require hospitals and other medical institutions to 
have in-house solutions for storing EHRs. Secondly, it can reduce patients’ fear of the cloud 
server having control of their EHRs. Lastly, it allows patients to have complete control and 
autonomy of their EHRs and who can access them. A biometric scan is a form of authen- 
tication which could be used in a system such as this. Iris authentication was chosen for 
use in this paper because of the higher accuracy, greater stability, lower intrusion and previ- 
ous work done on iris authentication for a blockchain-based vaccination passport [3,9]. The 
advantage of using biometric authentication is the lack of a requirement to securely store a 
256-bit private key, as is usually the case with blockchain authentication. Password-based 


authentication cannot be used due to the susceptibility to offline dictionary attacks [7]. 


Storing biometric information directly on the blockchain is not feasible due to the leakage of 
personal information that would ensue and the large template size of the biometric data. In- 
stead, the raw iris data is hashed with a locality-sensitive hashing (LSH) algorithm to reduce 
the size of the data and obfuscate the patient’s raw iris data. A public identifier identifies the 
patient to the healthcare institution when retrieving their EHRs with iris authentication. A 
private identifier only able to be created by the patient allows them to find their EHRs and 
decrypt and share them with select institutions. The patient’s encrypted EHRs are stored on 


an off-chain decentralised file storage network such as Interplanetary file system (IPFS) or 
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Interplanetary name system (IPNS). 


1.1 Objective 


The project’s primary objective is to propose a framework for securing a patient’s EHRs in 
an eHealth setting. The framework is modular, and it can be used in many applications and 
settings. The components can be modified and interchanged as the state of technology de- 
velops over time. The framework consists of several components. An iris template extractor 
is used to produce a vector from a patient’s iris scan. A LSH algorithm is applied to the 
extracted iris vector to obfuscate the content of a patient’s iris scan while providing a way 
to find the original hash later from a new scan. A smart contract deployed on an Ethereum 
Virtual Machine (EVM) compatible blockchain is used to store the hash of the iris scan and 
identifiers generated by the patient to index their EHRs. Regenerable public and private 
identifiers are used to encrypt the EHRs stored so that they can be decrypted at a later date 
with no requirement to secure and store private keys. A decentralised file storage system 
such as IPFS or IPNS is the preferred mechanism to store EHRs. However, any provider will 


be compatible with the framework. The desired characteristics of the framework shall be 


e The framework should be modular. Most of the components can be interchanged by 
institutions who wish the system to have different properties or as technology develops 


over time. It should also be usable in a variety of applications and settings. 


e The framework should be privacy-preserving. It should not be possible to infer who a 
specific patient is, their personal identifiable information (PII), or decrypt their EHRs 


through an attack on the system. 


e The patient should be able to regenerate their identifiers from the blockchain with a 


high probability and relatively time-efficient manner. 


1.1.1 Applications 


Three applications for the framework were thought of during the project. 
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1. Store EHRs for patients with full control in the hands of patients. 
2. Vaccination passport system capable of storing vaccination records anonymously. 
3. Store identity documents for travel purposes. 


4. Proof of entry (tickets) for events. 


1.2 Readers Guide 


Chapter 2 will discuss background information key to the understanding of the project. In- 
cluded but not limited to in this section are iris extraction techniques, LSH, blockchain, smart 
contract technology, key derived functions and decentralised storage systems. Chapter 3 will 
discuss the framework’s design and give a high-level overview. This serves as a preamble 
to Chapter 4, which will give a detailed explanation and an opinionated implementation of 
the proposed framework. Chapter 5 will discuss experiments, security and evaluate perfor- 
mance metrics. Chapter 6 will conclude the final year project and discuss future work and 


limitations. 


CHAPTER 2 


Background 


This chapter explores relevant background information regarding the technology used for 
this final year project. Iris authentication, LSH, blockchain technologies, General Data Pro- 
tection Regulation practices on the blockchain, decentralised file storage systems and key 


derivation functions are discussed. 


2.1 Iris Authentication 


Password and pin code based authentication has remained the most widely used authentica- 
tion technique in most settings. For most applications, they provide a cheap and efficient way 
for users to authenticate with the system. In recent years, other forms of authentication have 
started to become more mainstream. The most widely used other forms of authentication 
are fingerprint and facial recognition on smartphone devices. Password-based authentica- 
tion cannot be used on the blockchain due to the susceptibility to offline dictionary attacks. 
Blockchains cannot deploy countermeasures such as rate-limiting login attempts like tradi- 
tional databases have, as they are usually entirely public. Cryptographic private keys are 
used to authenticate users on the blockchain. They are secure; however, they require users 
to store the key securely. Cold storage devices such as those manufactured by Ledger and 
Trezor are popular and secure ways for users to manage their private keys to maintain ac- 


cess to their cryptocurrency wallets. They require the user to input a pin code to unlock 
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the device and utilise anti-tamper proof techniques to prevent them from being opened up 
and attacked [20]. Alternatively, users may choose to record a twelve or twenty-four-word 
mnemonic called a seed phrase on paper or etched on a fireproof material such as stainless 
steel. The seed phrase represents the user’s private key but provides a mechanism for the 
user to memorise part of the phrase or its entirety. Securing the key in a safety deposit box 
or a fireproof safe is a common practice. This is costly and still is not perfectly secure. The 
user may become the target of criminals, or the government may request that their safety 


deposit box be opened. 


British ophthalmologist J.H. Doggart was the first person to suggest that the irises of his 
subjects were unique to each other and that their features “represent a series of variable fac- 
tors whose conceivable permutations and combinations are almost infinite” [1]. Current 
research suggests irises remain constant over a person’s lifetime, therefore making them re- 
silient to change, given the user does not get injured or contracts a disease of the eye [27]. 
1.5 billion people worldwide have their iris scan enrolled in a national identification scheme, 
of which 1.2 billion are in the Unique Identification Authority of India (UIDAI) programme 
[41]. This growing number of registered iris scans worldwide can be attributed to techno- 
logical developments in cameras and iris extraction techniques and the high accuracy of iris 
scans in practical applications. There have been several previous applications of iris scans us- 
ing blockchain technologies, including storing COVID-19 vaccination records and the World 
Food Programme’s project Building Blocks [9,49]. Some of this project’s foundations build 
off the concepts and techniques used in the COVID-19 vaccination storage paper. Build- 
ing Blocks allows over one hundred thousand refugees living in the Azraq refugee camp 
in Jordan to purchase goods from authorised stores in the vicinity of the refugee camp by 


authenticating themselves with their iris scan. 


Iris feature extraction typically consists of three stages. 


1. A reference image or video frame is given. The inner and outer part of the iris is de- 


tected, and then the eyelids and eyelashes are excluded from the image. 
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2. The circular iris is transformed to a non-concentric polar representation which is known 


as Daugman’s rubber sheet model [12]. This is a form of normalisation. 


3. Features are extracted from the normalised image. Examples of iris features include 


arching ligaments, furrows, ridges, crypts, rings, corona and freckles. 


The encoded output is compared to other outputs. The most common matching metric used 
is Hamming distance. A set threshold is defined to achieve a favourable false acceptance rate 


and false rejection rate. 


2.1.1 John Daugman 


John Daugman is a British-American professor of computer vision at the University of Cam- 
bridge. Daugman is credited with creating and patenting the first fully functional iris au- 
thentication system [11]. Further research on iris authentication systems is based on Daug- 


man’s work. 


The first step in Daugman’s algorithm is detection of the inner and outer boundaries of the 
iris. Integro-differential operators are used to achieve this. One such integro-differential 


operator which can be used is show in equation (2.1). 


max (2:1) 


(r,29,Yo) 


* is the convolution operator, and G(r) is a smoothing function such as on a Gaussian scale 
of o. Effectively, this serves as a circular edge detector. The image is searched for the max- 
imum partial derivative concerning increasing radius r to the contour integral of the image 
I(x,y) along the circular arc ds with radius r. The blurring factor o is adjusted over time. 
Initially, it is set so that only the transition from the iris to the sclera (white part of the eye) is 
detected. When the radius and centre coordinate (7, 79, yg) narrows in on the maximum, a is 
set to a finer convolution scale so that the boundary around the pupil becomes more appar- 


ent and the range of the triple (7, 2, yo) is restricted. Initially, the range of ds is restricted to 
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the horizontal opposing 90° cones across the centre of the iris, as shown in Figure 2.1. This 
is done as the eyelids can obfuscate part of the iris. Then, in tandem with the change in a, 
the range of ds is set to the upper 270°, as shown in Figure 2.2. This is done as the lower 90° 


can suffer from corneal specular reflection from illumination below the video camera. Once 


the maximum has been obtained, there is a subsequent search for the pupillary boundary. 


Kw 


> Lari 


Figure 2.1: ds restricted to opposing 90° cones Figure 2.2: ds restricted to upper 270° 


Next, detection of the upper and lower eyelid boundaries is performed. Equation 2.1 is 
changed slightly so that the contour integral is changed from circular to arcuate. Iris im- 
ages and video frames are discarded if the upper and lower eyelids hide more than 50% of 


the total iris. The typical scenario for this is a blink. 


Daugman uses a rubber sheet model to normalise the image I(x, y). Each point on the iris 
is assigned real coordinates (7,0). r is the distance expressed in terms of the radius and is 
on the interval [0,1]. @ is the angle relative to the pupil and is on the interval [0,27]. The 


expression for this is shown in Equation 2.2. 
T(x(r,8), y(r,)) > T(r, 8) (2.2) 


The normalised iris pattern is then demodulated to compress the data and retrieve informa- 
tion regarding its phase using quadrature 2D Gabor wavelets. The quantisation process is 
shown in Equation 2.3 below. As shown below, two bits represent which quadrant of the 
complex plane each phasor lies in. The first bit represents the real plane, and the second bit 
represents the imaginary plane. The bit will be one of the corresponding components in the 


2D integral is positive and zero if it is negative as sgn means to take the sign. I(p, ¢) is the 
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raw iris image in a dimensionless polar format when accounting for pupil dilation. a and 8 
are wavelet parameters, w is the wavelet frequency and (1r,, 9) are the polar coordinates on 
the iris used to compute hype 7,3. A 256-byte template consisting of 2048 { Re, Im} pairs is 


generated from this process. 


2 2 
eRe, Im} = SEN pe Tm} [ [re dje iw(O9—) . e—(To—P) /a? 6— (80-9) /8 odpdd (2.3) 
prod 


2.1.2 Libor Masek 


Libor Masek developed an open-source iris authentication system mainly based on Daug- 
man’s work [24,25]. 


Masek uses Hough transforms to detect the boundary between the iris and sclera and the 
boundary between the pupil and iris. Hough transforms are similar to the technique of inte- 
erodifferential operators because they both use the first derivatives of the image and search 
for geometric parameters. Hough transforms benefit from not failing in the event of noise 
in the image, such as reflections. This is because the integrodifferential operator works on 
a local scale versus the more global scale of the Hough transform. Edge maps are created 
using Canny edge detection. Like in Daugman’s integrodifferential operator, the radius and 
centre coordinates are calculated for the iris and the pupil. The pupil search is performed 


after the iris search as the pupil is always inside the iris. 


A linear Hough transform is used to detect the eyelids. Each eyelid has a line fit through 
it with the linear Hough transform. Then a second line is fit, which intersects at the iris edge 
closest to the pupil. Canny edge detected is also used for this. The benefit of a linear Hough 
transform over a parabolic transform is that it is less computationally intensive. The down- 
side is that it simplifies the model of eyelids as they are curved in nature. Consequently, 
some information is lost about the iris as it is marked by noise. With more powerful hard- 
ware today, a parabolic transform is more viable as a solution. Eyelashes, eyelids and areas 


of the iris with specular reflections are marked as noise. 
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Masek uses Daugman’s rubber sheet model described in 2.1.1 to normalise the iris. Vec- 
tors are cast from the centre of the pupil dictated by the angular resolution 6, which are then 
subdivided by the radial resolution r times to give equidistant points along the given vector. 
The distance between points is given by the Euclidean distance between the edge of the iris 
and the edge of the pupil divided by r, and therefore varies as the pupil is not located in the 


centre of the iris. (p,,, p,,) denotes the coordinates of the pupillary centre in Figure 2.3. 


Figure 2.3: Daugman’s rubber sheet model 


Featuring encoding is performed by convolution of the normalised iris pattern with 1D Log- 
Gabor wavelets. Each row in the 2D normalised iris pattern is segmented into a 1D signal 
and has a Fast Fourier Transform applied to it before being convolved with 1D Log-Gabor 
wavelets. There is larger independence with the angular direction over the radial direction; 
hence Masek segments the rows instead of the columns. Masek uses Daugman’s method of 
phase quantisation with four levels. Recall that each point is comprised of two bits, the real 


component and the imaginary component. 
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2.1.3 Philip Braddish 


Philip Braddish made improvements to Masek’s iris authentication system for his thesis [5]. 
Braddish then used the improved system to store vaccination records on the blockchain. Of 
the improvements, some are bug fixes, and some are more substantial. The iris extraction 
and hashing algorithm used for this final year project is taken from the distributed ledger 
technology (DLT) based COVID-19 passport presented by Sarang Chaudhari et al. [9]. 


The first bug found was in Masek’s implementation of the normalisation process from Daug- 
man’s rubber sheet model. One of the angles was sampled twice; hence the normalised iris 
pattern had a duplicated column. This was fixed by changing the range to a half-open inter- 
val instead of an open interval from 0 to 27 as the angle 0 and 27 are the same. Another bug 
found was in calculating the average intensity of the iris. In Masek’s code, corrupted pixels 
are replaced instead with the average intensity of the iris as a whole before the encoding 


process. Braddish subsequently fixed this. 


The first of the more substantial changes was to speed up the program. While MATLAB 
is an excellent choice for image processing, it suffers in speed and efficiency due to it being 
an interpreted language similarly to Python [26]. The code was vectorised and refactored 
to improve its performance due to MATLAB’s highly efficient matrix and vector operations. 
The next improvement made was to the eyelid detection process. Recall that Masek uses a 
linear Hough transform to detect the eyelids, and while performant, it destroys part of the 
data from the iris. Results obtained by the U.S. National Institute of Standards and Technol- 
ogy’s (NIST) VASIR iris authentication system suggest that splitting the eyelids into three 
sections improves the performance [22]. Braddish uses this approach and performs a linear 
Hough transform on each section of the eyelid for a total of six linear Hough transforms per 
eye instead of two. This improves the system’s accuracy as less of the iris is corrupted. This 
improvement becomes a more viable option because of the speedup obtained by vectorising 


the code. 
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2.1.4 Iris Comparison 


The primary technique used in the literature to compare iris scans is Hamming distance. 
Daugman, Masek and Braddish make use of Hamming distance to match irises as they use 
bit templates [5,11,24]. Hamming distance measures how many bit disagreements there are 


at the same location across two binary vectors divided by the length of the vectors NV. 
1X 
HD= Noe (2.4) 
JF 


The average Hamming distance between the templates of two different irises is 0.5. Given 
that two iris templates are independent, their bit patterns will be random. Daugman corrob- 
orated this in his findings with his iris authentication system. The three systems require the 
template masks to be handled correctly. The masks of the two templates being compared 
are combined by a logical OR operator before a NOT operator is applied to it, and it is AN D 
ed with the templates. This ensures that only the non-corrupted bits are compared in each 
template. As only the non-corrupted bits are being compared, the total amount of corrupted 
bits in each template is subtracted from the total number of bits. This process is shown in 
the modified Hamming distance formula in Equation 2.5. X; and Y; are the compared bits, 


and Xm, and Ym, are the masks. 


N 
N— ee XM, V Ym, 


HD (2.5) 


An alternative to Hamming distance is to instead use the Weighted Euclidean Distance. The 
use case for this is if the iris templates aren't in bits, but instead are numbers. Zhu et al. 


makes use of Weighted Euclidean Distance in their paper for comparing irises [50]. 


(2.6) 
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Before irises are compared, a set of vectors must be constructed from the original template 
to account for rotational inconsistencies. This is prudent as two irises scans of the same 
eye, even under similar conditions, are not necessarily the same. One such variable is the 
rotation of the iris. The eye can rotate in the eye socket, and the user’s head might be tilted 
at a slightly different angle. All three iris authentication systems that have been covered 
account for rotation inconsistencies in the same way. The simplest method is to convert the 
2D template array into a 1D array by column concatenation. The 1D array is then circularly 
shifted left and right by twice the radial resolution several times in each direction. This 
results in the circular template being spun by 3 degrees with each rotation, where Ar are the 
angular resolutions. The Hamming distance between each alignment and the template being 
compared is recorded at each alignment, and the smallest distance is used as the determining 
value of if the template is a match or not. A threshold Hamming distance is used to determine 


if two templates are the same eye. 


2.2 Locality Sensitive Hashing 


Modern hash functions such as SHA-3 possess the avalanche effects, which means a single 
bit change in the input has a 50% chance of flipping half of the output bits. This is a security 
feature as it prevents gaining information about the input from the output. LSH algorithms 
differ in that similar inputs are hashed to similar outputs. Due to their deviation from the 
avalanche effect, they cannot be used in settings where the security of the input is vital. 
LSH algorithms are typically used instead to solve problems such as finding similar web- 
sites, comparing genomes and comparing images efficiently. S. Chaudhari et al. uses a LSH 
hashing algorithm for comparing hashed irises registered on the blockchain for a COVID-19 


vaccination passport [9]. 


For a hash function to be secure, it must have the following three properties: preimage re- 


sistance, second-preimage resistance and collision resistance [43]. LSH algorithms do not 
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possess all of these properties. They are not second-preimage resistant, as information about 
the input can be derived from the output. They are not collision-resistant as this is the entire 
purpose for their existence, and if this were the case, it would also imply second-preimage 
resistance. There are scenarios where the preimage is protected in LSH as in SimHash and 
S3Hash. 


2.2.1 SimHash 


SimHash is a LSH algorithm proposed by Charikar that estimates the cosine distance be- 
tween vectors [8]. Random projection hashing is the algorithmic technique of measuring 
the similarity of vectors by measuring the angle between them. In mathematics, cos(@) can 
be estimated by 1 — 2 when the angle is small. Thus the following expression is derived to 


show how SimHash can be employed to calculate the similarity between two vectors 7 and 


=> 


¥. eh 
P(h(@) = h(g)) = 1— ASD (27) 


A set of random vectors R¢ is taken from the hyperplane and used in the calculation of 
SimHash. Random vector 7 is chosen at random from the set then is used in the hash function 


h,(u) defined in Equation 2.8. 


1 iff-u>0 
h;(u) = (2.8) 
0 iff-w<O0 


Each individual calculation results in a single bit being produced. Vectors are continually 
taken from R¢ and hashed with w and concatenated together until the desired output length 
is obtained. The Hamming distance between h(x) and h(7) represents the similarity between 


the two vectors. 


2.2.2. S3Hash 


S3Hash, a variation of SimHash proposed by S. Chaudhari et al. is used for comparing iris 


scans on their blockchain-based COVID-19 vaccination passport [9]. It allows the compar- 
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Figure 2.4: 1-4 is a good estimate of cos(@) at small angles 


ison of binary input vectors in the same procedure used in SimHash. Random vectors in 
{—1,0,1}" where n which is the input vector length are added to the set R. Binary input 
vectors {0,1}" are hashed in S3Hash thus the sampling from the {—1,0, 1} field space for 
the random vectors. The sign function sgn returns 0 if the result of (x,1;,) is negative or 1 
otherwise and is virtually the same as h;:(u) in SimHash. ( ) denotes the inner product of 
two vectors in this case. The vectors used across the hashes have the property that unrelated 
inputs should have an average Hamming distance of 0.5. The vectors used for each set of 
data must be trained so that a set of vectors that possesses this property is built. It is recom- 
mended to use vectors that hash between 40% and 60% of the irises in the dataset to the same 


bit value. This ensures that unrelated inputs have a Hamming distance of 0.5 on average. 


S3Hash p(x) = (sgn((x,7r,)), ---, SEn((x, T,))) (2.9) 
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A hash function is input hiding if it is computationally hard or information-theoretically 
impossible to learn the input or any information about it given only the hash function and 
the output. S3Hash is shown to be input hiding given H(X) > m+ A, where H(X) is the 
entropy, m is the output length and is a security factor. Given any reasonably sized iris 


template input and short output length, the left-hand side of this inequality holds. 


2.3 Encryption 


Encryption is the cryptographic technique of converting plaintext into ciphertext. Modern 
encryption techniques use either public-key cryptography or symmetric key cryptography. 
Public-key cryptography involves the use of a pair of keys. A person encrypts plaintext 
with the intended recipient’s public key, and the recipient decrypts it with their secret key. 
Symmetric key cryptography involves using only one secret key used in both the encryption 
and decryption process. The history of encryption goes back to ancient times when simple 
cyphers were used. The Caesar Cipher, invented by Julius Caesar, was used to communicate 
with his generals. The cypher involves shifting the characters in the alphabet up or down a 
certain number of times. Modern cryptography involves using mathematical operations on 


the characters in plaintext to convert them into cypher text. 


2.3.1 AES 


Advanced Encryption Standard (AES) is an encryption specification established by NIST 
in 2001 [33]. AES effectively replaced Data Encryption Standard (DES) as the most widely 
used symmetric encryption scheme. DES is no longer used today due to its susceptibility to 


brute force attacks. 


Several rounds are performed depending on the key length used. AES supports 128-bit, 
192-bit and 256-bit keys. 10, 12 and 14 rounds are performed for each key size. A high-level 


description of the algorithm is given. 
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1. Key Addition layer: A 128-bit round key derived from the key is XORed with each byte 


in the state. 


2. Byte Substitution layer: Each byte in the state is non-linearly transformed with another 


byte in the state based on a lookup table. 
3. Diffusion layer: 


e ShiftRows: The bytes of the last three rows are cyclically shifted a set number of 
times to the right. The second row is shifted three times, the third row twice, and 


the last row once. 


e MixColumns: A linear transformation mixes each column of the state matrix. 


The state is comprised of a block of 128 bits split into bytes. This is organised into a 4x4 
column-major order matrix. Each step in the algorithm is called a layer. The last round per- 
formed does not include the MixColumns operation. Decryption is performed by performing 


the same operations used in encryption except in reverse. 


2.3.2 RSA 


RSA (Rivest-Shamir—Adleman) is an asymmetric encryption scheme used for secure data 
transmission. It was invented in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman, 
who studied at the Massachusetts Institute of Technology. It is still widely used today, ex- 
cept with much larger key sizes. Whitfield Diffie and Martin Hellman are credited with the 


idea of asymmetric cryptosystems. 
RSA follows four main steps. 


1. Key Generation: 


(a) Two large prime numbers p and gq are chosen. n is computed as p - q. n is used as 


the modulus for the public and private keys and is public. 


2.3. ENCRYPTION 18 


(b) Compute the lowest common multiple of p — 1 and q — 1. Call it A(n). It is kept 


secret. 


(c) An integer e is chosen that is coprime with \(n) and is less than \(n). e has a 


small bit-length and is public. 


(d) Compute d = e~'(mod X(n)). d is the private key exponent. The public key con- 


sists of n and e, and the private key consists of d. 


2. Distribution: The public key is distributed to all parties wanting to transmit a secret 


message MV. 


3. Encryption: M is converted into an integer m, and padding is added. The ciphertext c 
is computed. 


c = m* (mod n) 


4. Decryption: m can be decrypted from c by using the private key exponent. The padding 


scheme is reversed to retrieve M/. 


ct = (m°)4 = m (mod n) 


The difficulty in breaking RSA lies in the design that finding prime factors of very large 
numbers is difficult. Over the years, some security vulnerabilities and countermeasures 
have been implemented to keep RSA secure. Several attacks exist on unpadded RSA. An 
attacker can guess many plaintexts, encrypt them with the public key and compare them 
to the ciphertext. This is possible as RSA is deterministic and has no random component. 
Using a low value for the public exponent e, also has issues. The lowest possible value for e 
is 3. Decrypting ciphertexts is possible when low values of e are used. This involves solving 
a system of equations where the same plaintext is encrypted several times with different 
public keys [28]. Using a suitable padding scheme ameliorates the issues associated with 


unpadded RSA. 
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2.4 Blockchain 


A blockchain is an immutable distributed ledger where new transactions are included in 
blocks appended to a list of blocks at an interval. Blockchain technology came to fruition in 
2008 when Bitcoin was invented by an anonymous entity called Satoshi Nakamoto [29]. A 
transaction is broadcasted to a node on the network, which is then propagated to all partic- 
ipating nodes. Nodes on the network compete to add the next block through a consensus 
mechanism such as Proof Of Work (PoW) or Proof of Stake (PoS). Nodes are incentivised to 


participate on the network by earning a reward if they succeed in adding a block. 


2.4.1 Bitcoin 


Bitcoin is described as ”A Peer-to-Peer Electronic Cash System” by Satoshi Nakamoto in the 
Bitcoin whitepaper [29]. The primary motivation behind the Bitcoin whitepaper was to pro- 
vide a mechanism for digital payments to be sent between individuals without the use of a 
central authority such as a financial institution. Digital signatures alone cannot be used as 
payees cannot verify that the money was not double-spent. A central authority could vali- 


date that double-spending does not occur, but it is a centralised solution. 


Bitcoin prevents double-spending by securing the network with nodes. The longest chain 
of blocks is determined to be the majority decision on the network. Given that honest nodes 
control more than 51% of the network hash rate, the honest chain will outgrow any compet- 
ing chains. The hash rate is the sum of all of the computation power on the network at a 
given time. PoW is the consensus mechanism used on the Bitcoin network. It involves a race 
to find a nonce that when appended to the block hash, hashes to a number with a certain 
amount of leading zeros. A block reward is awarded to whoever succeeds in finding that 
nonce value, which is easily verifiable by other nodes on the network. This reward halves 
every 210,000 blocks (roughly four years) and has dubbed this event as the “halving”. This 
mechanism was coded in to reduce the inflation of Bitcoin over time. The last Bitcoin will be 


mined in 2140. Miners will then receive only transactions fees from other users. The amount 
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of leading zeros is adjusted depending on the network’s hash rate to achieve a block time 
of ten minutes. Bitcoin uses the SHA2-256 hash function. Application-specific integrated 
circuits (ASICs) are commonplace in Bitcoin mining due to their strong capabilities in com- 
puting hashes. The downside to PoW is the large amount of energy used for securing the 


network. 


Bitcoin uses a scripting language called Script. Itis not Turing-complete and does not support 
loops. Instructions are specified through a series of opcodes used for actions on the network, 
such as signing transactions, hashing and comparing keys. Bitcoin is therefore somewhat 


limited in its functionality, and sending and receiving Bitcoin is the only use case. 


2.4.2 Ethereum 


Vitalik Buterin published the Ethereum whitepaper in 2014 [6]. Buterin described it as “A 
Next-Generation Smart Contract and Decentralised Application Platform”. Participants on 
the Ethereum blockchain have accounts with a unique 20-byte address. Each account has 
a nonce that indicates the transaction count, an ether balance, contract code and storage. 
Ether is the cryptocurrency used to pay for transaction fees on the Ethereum blockchain. 
Transactions are signed by the account’s private key and are sent to another address on the 
blockchain. Transactions are used to send Ether to other accounts or interact with smart 
contracts. Transactions have a ”gas” fee proportional to the complexity of the instructions 
executed by validating nodes. Each byte of data sent in a transaction also costs gas, and this 


deters the blockchain from being filled full of junk. 


PoW is the current consensus mechanism used on Ethereum. Graphics processing units 
(GPUs) are typically the most sought after devices for mining on Ethereum as opposed to 
ASICs on the Bitcoin network. Ethereum 2.0 is a highly anticipated upgrade to Ethereum. 
The main goals of Ethereum 2.0 are to increase the scalability of the blockchain into the future 
and to move the blockchain to a PoS consensus mechanism. The transition to PoS is antic- 


ipated to occur in Q3 of 2022 in an event dubbed ”The Merge”, which will see the Beacon 
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Chain merged into the main Ethereum network [13]. By design, Ethereum is a more versatile 
blockchain than Bitcoin. The Ethereum Virtual Machine (EVM) is the runtime environment 
for executing transactions in Ethereum. The EVM has a Turing-complete instruction set that 
allows smart contracts to be easily written and deployed on the blockchain. This has led to a 
plethora of decentralised apps (DApps) appearing on the network, with use cases ranging 


from non-fungible tokens (NFTs) to decentralised finance (DeFi). 


2.4.3 Smart Contracts 


A smart contract is a program with a predefined transaction interface that executes auto- 
matically. Smart contracts are deployed on the blockchain from an account. Smart contracts 
are immutable, meaning once they are deployed to a certain contract address, the underly- 
ing code cannot be changed; only the inner state can be changed. This has led to the usage 
of “upgradeable” smart contracts, which use a proxy contract to call another smart contract 
with the underlying functionality. Smart contracts on Ethereum are most commonly written 
in Solidity; however, they can be written in any language that compiles to bytecode that can 
be executed by the EVM [47]. 


2.5 Decentralised File Storage 


The distribution of files across the web has classically relied upon dedicated servers. Nap- 
ster and Gnutella were some of the first peer-to-peer file-sharing applications to become 
mainstream. On June 1, 1999, Napster launched and heavily focused on digital audio file 
distribution. However, its lifespan was short-lived due to legal issues arising from copy- 
right infringements. While Napster was the first step in decentralised file storage, it still 
relied heavily on a central authority to index the files that are being shared by peers on the 
network. Once the user requesting the file has received back the peers sharing the file from 


the central servers, the user can then request to download the file directly from those peers. 


Gnutella was launched in 2000 by Nullsoft, shortly after the company’s acquisition by AOL. 
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Shortly after being launched, the client was taken down over legal concerns. However, the 
protocol was reverse-engineered, and clones became available soon thereafter. In contrast 
with Napster, Gnutella’s protocol does not rely on a central authority to maintain and index 
files on the network. Instead, peers on the network form an overlay network. To locate a file, a 
user sends a query packet to neighbouring nodes on the network. Each neighbour checks if 
it has the file in question stored locally. If it does, it responds with the query response packet. 
It then sends the query packet to its neighbours and so on. This ping and pong mechanism is 
used to discover new nodes on the network. The query resonates through the network until 


the time-to-live (TTL) on the query packet is exceeded. 


query: ttl=2 


Napster Central Index Server 
maintains file locations 
Filename 


We 7 C 
| Location ty. Vy ; 
Wonderwall.mp3 fs 13,52,22:3000 ig Server 
| 
_ 
| 


query: ttl=2 query: ttl=1 


hit P,! hit P.! 


download 
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Figure 2.5: Comparison of the Napster and Gnutella Architecture 


2.5.1 InterPlanetary File System 


IPFS is a decentralised peer-to-peer file storage network. IPFS was released by Protocol Labs 
in 2015. IPFS has millions of daily users and over 250,000 daily active nodes [38]. Files stored 
on the network are uniquely identified using content-addressing. A distributed hash table 
(DHT) is used to locate which peers host files on the IPFS network, and the Merkle directed 
acyclic graph (DAG) is used to structure the directories [17]. 


The content identifier (CID) is a label that points to files in IPFS. is based on the crypto- 
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graphic hash of the file’s contents. This allows IPFS to remove duplicate files on the network 
as they share the same CID. It does not indicate where the file is stored but rather who is 
storing it. When a file on the IPFS network gets updated, a new CID is created for it. This 
effectively serves as the network’s version control. There are two versions for the CID cur- 
rently, CIDv0 and CIDv1. The CID is typically encoded with Base58 to shorten the length of 
the string. 

CIDv0 = base58btc(<multihash>) 


CIDv1 = <base>base(<cid_version><ipld-format><multihash> ) 


In CIDv0, the Interplanetary Linked Data Format (IPLD), CID version and encoding is im- 
plicit in the string and cannot be changed, whereas, in CIDv1, these parameters are explicit 
and must be included for a valid CID. Multihash is a self-describing hash format. The hash 
function used is based on the security requirements. Currently, Multihash on IPFS uses 
SHA2-256. The hash code and digest length are prefixed to the hash digest value. The hash 
code is 0x12, which represents SHA2-256, and the digest length is 0x20 which represents the 
length of the digest in bytes. 


The DHT is a distributed system for storing key-value pairs. A user uses the DHT to de- 
termine which peers store the desired content. The Kademlia algorithm is used to build the 
DHT in IPFS. The libp2p project provides the DHT in IPFS and handles peer connections. 
libp2p is used to query the DHT, and two queries are required to find the file location. One 
query handles finding the peers, and the other handles finding the location of those peers. 
Bitswap is used to handle the exchange of blocks between peers. The Merkle DAG structure 


efficiently verifies that the blocks received are valid. 


Merkle DAGs are similar to Merkle trees used in other blockchain technologies; however, 
they have no balance requirements, and branches can re-converge. IPFS files are split into 
blocks of 256kb. That means files larger than 256kb in size, such as images, are split into 
256kb blocks and stored across different peers on the network. Each of these blocks and di- 
rectories has its own CID. The benefit of the Merkle DAG is that users can verify that the files 
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they are downloading from the network are authentic with log(V) hashes. As the name sug- 
gests, there cannot be cycles in the Merkle DAG. There is no way of returning to the source 


directory by traversing down its descendants. 


CID, | 


//The Root | 


: hello _world.txt 
large photo. jpeg my_directory 
CID, | CID, CID, | 
metadata | //A Directory test.txt "Hello Wort! 


| CID, | CID, CID, | 


//part one of //part two of "Hello Mars!" 
file file 


Figure 2.6: The Merkle DAG structure used in IPFS 


2.5.2 InterPlanetary Name System 


IPFS and IPNS are mostly similar in all aspects apart from how content is addressed. In 
contrast to IPFS, which uses content-based addressing, IPNS creates an address where the 
content can be updated. With IPFS, a file can essentially not be deleted or revoked unless no 
peers pin the file anymore. With IPNS, access to a file can be revoked by deleting it. Also, its 
contents can be changed without changing the CID. The IPNS name is the hash of a public 


key. New keys can be generated to address content with a different name [18]. 
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2.6 Key Derivation Functions 


A key derivation function is an algorithm that derives secret keys of the desired length from 
a source value (secret), typically of a much shorter length. The source value usually has 
good randomness but is not distributed uniformly, or attackers have some knowledge that 
can reduce the search space. The derived key is cryptographically strong and is uniformly 
distributed. They are often used to stretch the entropy of keys or derive keys of a required 
length, such as for use as a symmetric key in advanced encryption standard (AES). Typically, 
hash functions are designed to be highly performant. The opposite is true for KDFs. The goal 


is to limit an attacker’s ability to brute-force the key. 


2.6.1 HKDF 


HKDFisa key derivation function created based on HMAC message authentication code that 
was created in 2010 [21]. HKDF uses an extract-then-expand approach. This entails taking in 
or extracting from an unevenly distributed keying material (the input) and expanding it into 
a variable-length output that is pseudorandom and evenly distributed. A pseudorandom 
key PRK is generated by the first module, the randomness extractor XT R. XTR takes in 
an optional extractor salt XT‘S and source keying material SK M. The key material KM 
is the basis from which the cryptographic keys are generated. The second module is the 
pseudorandom function PRF™* which in HKDF is simply the HMAC hash function. It takes 
in the previous generated PRK, an optional string of context information CTXInfo which 
includes key-related information, information about the application or protocol using the 
key derivation function or session identifiers such as a nonce or a time. L is the length of the 


total output length in bits. For example, if two 256-bit values are required, L should be 512. 
PRK = XTR(XTS,SKM) 


KM = PRF*(PRK, CTXInfo, L) 
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Figure 2.7: The extract-then-expand approach used in HKDF 


HK DF(XTS, SKM, CTXInfo, L) = K(1)||K(2)||..-||K(t) (2.10) 
PRK = HMAC(XTS, SKM) 
K(1) = HMAC(PRK, CTXInfol| 0) 
K(i+1) = HMAC(PRK, K(i)||CTXInfol| 0), 1<i<t 


The HKDF scheme is specified in Equation 2.10. Concatenation is defined as ||. A series of 
concatenated keys are produced based on the input parameters. A(t) is truncated to the 
first d = L mod k bits. t = [£] where k is the output key length from the HMAC hash 
function. HKDF is a simpler scheme that does not increase the effective entropy but allows a 
low entropy unevenly distributed source to be more evenly distributed. Effective entropy is 
the notion that a passphrase with lower entropy paired with a key derivation function takes 
as long to brute-force as a passphrase with higher entropy. For this reason, it is unsuitable for 
password-based key derivation. Two schemes suitable for this purpose will now be looked 


at. 


2.6.2 PBKDF2 


Password-Based Key Derivation Function version 2 (PBKDF2) is a key derivation function 


created by RSA Laboratories as part of Public Key Cryptography Standards 5 (PKCS #5) [40]. 
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PBKDF2 uses highly intensive CPU operations to slow down the derivation to prevent brute- 
force attacks on weaker passwords. The basis of operation for PBKDF2 is a pseudorandom 
function (PRF) such as a hash function or HMAC that is iterated many times. PBKDF2 
is cycle free. This means that an attacker cannot compute an equivalent set of less CPU 
intensive operations to derive the same key as the set of more expensive ones. PBKDF2 
can derive a key of any arbitrary length dk Len, allowing the derivation of many keys from a 
single secret password /passphrase p. A salt s is used to prevent attackers from using rainbow 
tables or precomputation. Finally, an iteration count c is the iteration count or the number of 
times PRF is run during the operation of PBKDF2. The derived key DK is output as shown 
in Equation 2.11. 

DK = PBK DF2(p, s,c, dkLen) (2.11) 


DK is the result of the concatenation of [dkLen/hLen| blocks, where hLen is the output 
length of the PRF. 
DK = Fy | [75 ||| |Lpdehen saber (2.12) 


A block T; is computed by computing the PRF c times and XORing the results together. The 
previous result of the PRF is used as the salt in the ensuing calculation to ensure that all 
iterations are completed by anyone using the PBKDF2. This process is shown in Equation 
Dilos 

T, =U, @U,0...8U, (2.13) 


U, = PRF(p, s||?) 


Uy = PRE(p,U;) 


U,= PRE (p;U.2)) 


The value for the iteration count is application-specific and changes as hardware gets faster. 
In 2009 Apple used 2000 iterations for iOS3 while in 2021 OWASP recommends 310,000 iter- 
ations for PBKDF2 when using HMAC-SHA-256 as the PRF [32]. Password managers such 


as Dashlane often use PBKDF2 to stretch a user’s master password into unique passwords 
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for individual websites [10]. PBKDF2 has some weaknesses worth discussing depending 
on the PRF used. If SHA-1 is used, it is possible to reduce the computational overhead by 
precomputing a message block. p is constant throughout each iteration; therefore, it can be 
precomputed in advance to reduce the number of operations required by 50%. The user can 
also take advantage of this; however, this should be seen as an optimisation implemented 
into crypto libraries rather than an exploit. Secondly, some operations do not need to be 
evaluated. In this case, some XOR operations involve zero-based operations, which do not 
contribute to the output and therefore do not need to be evaluated. Additionally, some words 
in the message block are set to the same value and repeated due to passwords being short. 
This reduces the computational overhead for an attacker as if the same value is XORed twice; 


the result is the original value [46]. 


2.6.3 Argon2 


Argon? is a key derivation function proposed by Alex Biryukov et al. that won the Password 
Hashing Competition 2015 [4]. Argon2 aims to protect against the vulnerabilities often suf- 
fered by other key derivation functions. Argon2 employs a state of the art memory-hard 
function. The gains typically seen with GPUs and ASICs are not present with Argon2. This 
makes using specialised hardware no more performant than standard desktops computers. 
For instance, it can take advantage of multiple cores and multithreading on desktop CPU 
architectures and is optimised for the x86 architecture. Three variants of Argon2 exist, Ar- 
gon2d, Argon2i and Argon2id. Argon2d is more suitable for cryptocurrency mining or back- 
end server key derivation as side-channel attacks exist which arise from its data-depending 
memory access. Argon2i instead uses data-independent memory access and therefore does 
not suffer from the same side-channel attacks. Argon2id is a combination of both of the ap- 
proaches above. It is more suitable for password-based key derivation. Argon?2 fills memory 
rapidly, and 1GB of RAM is filled in less than a second. 


Argon2 has primary and secondary inputs. The primary inputs P and S are the password 


and salt. The secondary inputs consist of the following: the degree of parallelism p, tag 
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length or output length 7, memory size m, number of iterations t, version number v, secret 
key K and associated data X. The number of iterations allows programmers to increase the 
time independently of memory size, which is not the case for other password hashers such 
as Scrypt [34]. 

DK = Argon2(P,S,p,7T,m,t,v, kK, X) (2.14) 


The operation of Argon2 follows three main steps. 


1. The same extract-then-expand concept used in HKDF is applied. The password, salt, 
secret key and any associated data are hashed with the BLAKE2b hash function to 


extract entropy. 


2. Memory is then filled with a 2D matrix B|i|[j], consisting of m 1024-byte blocks with 


p rows and q = |m/d| columns. It is structured as shown below. 
Blil[0] = G(Ap, O||4), 0S t<p 


Biil[] = G(Hy, 1|0), 0<i<p 
Blillj] = G(BUdly — 1), BI’), OS i <p, 2<5 <4 


B\i’||j’] is determined differently for Argon2i and Argon2d as Argon2i uses 
data-independent memory access. This procedure is repeat t times and the result is cal- 
culated by XORing the last columns. G' is a compression function based on BLAKE2b 
rounds and takes in two 1024-byte blocks and has a 1024-byte output. 


By, = BOl[a — 1] @ Bil][a—- 1] @... ® Bld — 1][g—1] 


3. Finally, the output is computed. The 64 byte result of the hash of B,,, is split into two 
32-byte parts. The first 32 bytes are appended to the output and the last 32 bytes are 
appended to B,,, and hashed again. This process is repeated until the output reaches 
the length r. 
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Figure 2.8: Argon2 with one iteration 


There are no published attacks on Argon2d; however, there have been two on Argon2i. The 
first proposed by Dan Boneh et al. uses a Balloon Hashing algorithm to reduce the neces- 
sary memory required by a factor of four without incurring a time penalty. This has subse- 
quently been fixed through an update in version 1.3. The second attack proposed by Alwen 
and Blocki yields asymptotic reductions in the amount of memory required under specific 
circumstances. The number of iterations can be increased above ten to prevent this attack 


from succeeding, but the attack has not been fixed by Biryukov et al. yet for lower values [2]. 


2.7. Related Work 


There has been an increase in interest in blockchain technology since 2017. This has given 
rise to several papers surrounding EHR storage using a blockchain model. Here we will look 
at the most cited of these blockchain-based solutions and why there is a need to improve this 


space. We will first look at the inspiration behind this project and from what basis it is built. 
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2.7.1 Framework for a DLT Based COVID-19 Passport 


Framework for a DLT Based COVID-19 Passport uses LSH to hash a users’ iris scan. The 
hashed iris scan is registered on the blockchain to provide proof of vaccination [9]. A permis- 
sioned blockchain using Proof of Authority (PoA) as the underlying consensus mechanism 
facilitates registration of the hash of a user’s iris scan and stores their vaccination records 
tied to the user’s JD on the blockchain. The JD is created on-demand to retrieve the vacci- 
nation record when requested by border control. This process involves scanning their iris 
scan, hashing it and comparing it to other scans on the blockchain. The user’s date of birth 
DoB and Gender are used as two-factor authentication as when comparing vast amounts of 
hashed scans there is a small chance of a false positive. S3Hash is the LSH algorithm em- 
ployed in the framework, and it is denoted by H, above. The raw iris template is in the form 


of a binary vector fv. H, is any modern hashing algorithm such as SHA-3. || is concatenation. 
ID = H,(DoB || Gender || H,(fv)) 


The iris matching accuracy was evaluated and shown to be sufficient even for large popula- 
tions, given that some form of two-factor authentication (2FA) is used in the system. In the 
conclusion, the authors mention the extension of this based mechanism to other use cases. 
One use case mentioned was a medical record storage system. Due to the nature of medical 
records, they must be encrypted before being stored. They cannot be stored directly on the 
blockchain due to the inefficiencies of the blockchain in storing large amounts of data and its 
immutable nature. Instead, storing it off-chain in an encrypted form is a viable solution. This 
still provides a decentralisation way of storing them while still being revocable and feasible 
in a cost sense. One emphasis of this project is security. More emphasis was placed on the 


iris hashing than blockchain security and privacy in the vaccination passport framework. 


2.7.2 Using Blockchain for Electronic Health Records 


Shahnaz et al. propose a framework for using the blockchain technology for EHR and stor- 


ing the records with granular access rules [42]. The framework uses smart contracts with 
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CRUD operations and stores EHRs off-chain on IPFS. The blockchain used is Ethereum, and 
the smart contract is written in Solidity. The smart contract checks the access level of who 
is calling the contract to see if they have the required level of access needed. In all cases, 
apart from viewing a record, the blockchain account must be the doctors’. The only instance 


where this is not the case is when a patient wants to view their medical record. 


There are several problems with this solution. First of all, the records stored off-chain are not 
encrypted. Therefore, anyone with the IPFS hash can view them. There is no mention in the 
paper that anyone can simply observe the input data coming into the contract and see the 
added IPFS hash. Just because they might not be able to call the contract functions to retrieve 
the IPFS hash does not mean the framework is secure. An attacker can operate outside the 
framework’s scope to see doctors calling the contract with the IPFS hash included in their 
transaction. Secondly, there is no mention that this is a private or permissioned blockchain. 
If it is assumed that the contract is deployed on the public Ethereum mainnet, this is a se- 
curity issue as mentioned above. Thirdly, patients do not have direct autonomy over their 
EHRs. Doctors must act in their patient’s best interest with how they add the records and 
share them. Lastly, IPFS is not the best option for this framework. IPNS should be used 


instead so the records can be deleted easily when no longer required. 


CHAPTER 


Design 


A high-level overview is provided of the EHR storage framework in this chapter. The com- 
ponents used in the framework are outlined below. Specifics such as algorithms are not 


provided in this chapter but will instead be looked at in Chapter 4. 


3.1 Components 


1. An iris template extractor is used to capture information about the user’s iris from 
a close-up image of their eye. As specific equipment is required to scan irises, this 
process would occur at a healthcare institution with the capability to scan irises. The 
template extraction algorithm used is efficient, and there are no long delays for the 


algorithm to extract the information. A binary feature vector template is produced. 


2. The binary vector template extracted is hashed with LSH. This is done for several rea- 
sons. First, it obfuscates the user’s raw iris information. An attacker can’t gain direct 
access to the user’s raw iris information from the hash of it. Secondly, it still allows 
for comparisons between irises to take place, given there is a standard in place across 


institutions for the LSH algorithm and parameters chosen. 


3. A blockchain is used to anonymise the actions of institutions on behalf of patients. 


A series of privacy-preserving identifiers are used, which anonymise the patient. The 
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hashed iris template vector is registered on the blockchain by the institution. Identifiers 
point to the off-chain storage location of the encrypted patient’s EHRs. The symmetric 
encryption/decryption key is regenerable, meaning there is no requirement for the 
patient or institution to store a secret key. Using off-chain storage allows for the EHRs 


to be deleted and serves as the system’s form of revocability. 


In Figure 3.1, the components used in the system are shown. The patient’s iris scan un- 
dergoes feature extraction to generate a binary vector representation of the features of the 
patient’s iris. The extracted binary vectors get hashed with LSH. The hash of the user’s iris 
scan is registered on the blockchain via a smart contract. The patient’s EHRs are encrypted 
and stored off-chain. Anonymous identifiers are used to make reference to that storage lo- 


cation and decouple it from the patient in question. 
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Figure 3.1: System overview 


3.2 Procedure 


A state diagram is shown in Figure 3.2 to provide a visual representation of the textual de- 


scription provided. 
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1. A patient scans their iris scan at a participating healthcare institution. If they are not 


registered already, the hash of the user’s scan HoS is registered on the blockchain. 
HoS = h,(L) (3.1) 


The HoS is generated by taking the patient’s extracted iris template vector J and hash- 
ing it with a LSH algorithm h,;. Their PublicI D is then created using some of their 
personal information. The date of birth DoB of the user is in yyyy/mm/dd form. The 
Gender of the user is either male, female or other. The country of birth CoB is in 
Alpha-2 code form (i.e. Ireland is “IE” and America is ”US”). All this information is 
concatenated together, denoted by || and hashed. The SHA-3 hash function hg is used 
to hash the PII into the identifier. 


PublicID = hg(HoS || Gender || DoB || CoB) (3.2) 


If the user is already registered and has indexed their EHRs with the system, the 
procedure is much the same; however, the original HoS must be retrieved from the 
blockchain to correctly create the patient’s identifiers. This is done by scanning their 
iris again and hashing it. Then, a set of test Publicl Ds is created using the closest 
matching irises hashes registered on the blockchain. Each PublicID is checked if it 
exists. If it does, that particular HoS used in the generation of the PublicI D was the 
original one registered to that patient. The other information used in the PublicI D 
serves as 2FA as there is a small false-positive rate in irises matching. If no Public D 


is found, it is deemed that the iris scan was of poor quality, and it is retaken. 


2. Next, the patient’s EHR are indexed to their PublicI D. It is important that the identifier 
associated with the EHR cannot be used to derive any information about where it is 
stored. For this, we take the hash of the file’s contents Ff’. The hash of the record HoR 


is derived as shown in Equation 3.3. 
HoR = h,(F) (3.3) 


The PublicID points to a list of HoRs which represents each of the individual EHRs 
associated with the patient. If the patient gets more scans, blood tests etc. they add 
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another HoR for each record. 


PublicID —> {HoR,, HoRg, ..., HoRx} (3.4) 


3. Lastly, the patient needs to store their EHR off-chain. They are free to choose any cloud 
storage provider; however, IPNS is recommended for maintaining decentralisation. 
The off-chain storage allows the patient to delete the record and constitutes the sys- 
tem’s form of revocability. A PrivateID is created. This is essentially the patient’s 
secret key which must never be shared with anyone. Their IPNS key can be derived 
from their Private[ D. A KDF hx is used for the hash function to increase the iden- 
tifier’s effective entropy and make it resistant to brute-force attacks. A 6 digit PIN is 
memorised by the patient. The patient’s social security number SSN or their country’s 


equivalent is concatenated with the HoS and PIN. 
PrivateI D =h,(HoS || PIN || SSN) (3.5) 


Using the PrivateI D allows the patient to retrieve the encrypted EHR’s storage loca- 
tion L. This process is shown in Equation 3.5. This is the decoupling step between the 
patient’s PublicI D and their secret Private D. There should be no way for an attacker 


to associate a list of EHR locations to a particular patient. 
hg(Publicl D || PrivateI D || HoR) — L (3.6) 


The EHR is encrypted with a symmetric-key encryption algorithm such as AES-256. 
The secret key 5S), is generated by combining the bits of the PublicID and PrivateI D 


and is shown in Equation 3.7. 
S, = Publicl Do.2g + PrivateI Dy8.955 (3.7) 


All interactions with the blockchain are performed by the healthcare institution on be- 
half of the patient. This is done to maintain security standards and to prevent instances 
of the patient following the wrong procedures when interacting with the blockchain. 
The healthcare institution uses single-use accounts for each interaction with the blockchain 
and waits randomised amounts of time between transactions involving the same pa- 


tient to anonymise their actions on the public ledger. 
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1. Capture Iris Scan (S) 
2. H(S) = SimHash(S) 


Find HoSes with hd(h, H(s)) < threshold 


HoSes = {HoS,, HoS,, ... , HoS,} File = Decrypt(Anon ID’) 


while HoS still in HoSes 
PublicID* = h(HoS || CoB || Gender || DoB) BAF while HoR still in PublicID -> HoRes 
Anon_!ID' = h,(PubliciD || PrivatelD || HOR) 


PublicID exists? PrivatelD = h(HoS || SSN || PIN) 


Figure 3.2: Procedure used in the system 


CHAPTER A 


Implementation 


All components described in Chapter 3 will now be looked at more closely to detail the exact 


implementation used. 


4.1 Iris Extraction and Processing 


The base code is provided by Masek [24]. However, Braddish made some improvements de- 
tailed in 2.1.3 to the code base; therefore, Braddish’s improved code was used instead [36]. 
The extraction technique used is based on John Daugman’s original extraction method [11] 
with some adjustments, which are detailed in 2.1.2. The code is implemented in Matlab. 
Braddish provides setup instructions to run the code. The code ran without much difficulty. 
The only additional setup required was installing the Image Processing Toolbox at version 
11.2. The add-on provides utilities for working with the iris images. The code caches the 
segmented irises in the Matlab path to reduce computational costs when creating vectors 
templates for the iris database being used. This is useful for testing the database with differ- 
ent parameters, such as the angular and radial resolution used. The code outputs a folder 
with three subfolders titled “MaskedTemplates”, ”Masks”, and “Templates”. “Masked Tem- 
plates” is simply the result of logical ANDing the ”“Templates” with the inversion of the 
“Masks”. 
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4.1.1 Rotational Inconsistencies 


As mentioned in 2.1.4, rotational inconsistencies in the iris images must be handled in the 
comparison. This is achieved by rotating the flattened binary vector by twice the radial res- 
olution when the binary vector is generated via column concatenation. This is done because 
two bits are used to represent each pixel on the iris. The rows represent the angular direc- 
tion, while the columns represent the radial direction. The code outputs a flattened binary 
vector which is the result of column concatenation of the 2D iris vector template. This makes 
handling rotational inconsistencies easier. A set of iris templates is created, which represents 
the iris in several different orientations. The binary vector is rotated a set number of times in 
both the left and right direction and hashed with LSH h, at each orientation. This is shown 
in Equation 4.1. I is the extracted iris binary vector, and the subscript indicates which rota- 
tion it is at. 0 indicates no rotation performed; a positive rotation indicates a right shift, anda 
negative indicates a left shift. k is the number of rotations in each direction. At each orienta- 
tion, the iris is compared to the ones registered on the blockchain. The minimum Hamming 
distance is used as the determining value in the similarity between irises. The number of 
rotations used depends heavily on the dataset used. Some datasets have more variance be- 
tween iris scans of the same eye and require more rotations. Others have little variation 
between iris scans of the same eye and require fewer rotations. The number of rotations 
used is calculated by performing tests on the dataset. More rotations can improve accuracy 
but increases computation costs as more hashes must be compared. It also increases false 
positives. In this paper, we will look at the CASIA-Iris-Interval dataset because of its popu- 
larity, scan quality, and balanced variation between irises [37]. More datasets are explored 
by Braddish [5]. 

Hashes = {h,(I_,), ---,h,(U_1), hp Uo), hp (dy), ---, ae Ui,)} (4.1) 
The masked templates are used for the iris template vectors. Various methods of handling 
the masks were explored by Braddish, but ultimately combining the mask and template was 


the most practical approach [5]. Ideally, Daugman’s method of handling masks is the most 


accurate. This involves combining (ANDing) the masks of the two irises being compared 
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before comparing the irises. This is not possible, however, when storing the hashes of the 
irises on the blockchain as the mask of the original scan is not and cannot be stored on the 
blockchain for security and size reasons. This is not considered to be a major problem, how- 


ever, as a form of 2FA is used in the iris comparison process. 


4.1.2 Iris Comparison 


S3Hash is used to hash the irises. Random vectors are generated by using the method de- 
scribed by Braddish [5]. Random vectors with the same length as the iris template vector 
consisting of -1s, 0s and 1s are generated. The inner product of two equal length bit-vectors 
results in a single bit output. If the vector hashes between 40% and 60% of the irises in 
the dataset to the same bit value, the vector is accepted. This is repeated until the number 
of vectors in the set equals the desired hash output length. Increasing the hash length de- 
creases the false positive rate but linearly increases performance costs and the storage space 
required on the blockchain. This translates into increased gas costs for registering the hash 
of an iris scan. The hash length must be at least below half of the length of the iris input 
vector to guarantee S3Hash’s input hiding property [9]. A length of 512 bits provides good 


accuracy and is still small enough to be input hiding and reasonable on blockchain resources. 


The Hamming distance is used to compare the hashes. The method of handling rotational 
inconsistencies in 4.1.1 is used. The iris template is hashed with S3Hash at each rotation. 
The set of hashes is compared to each registered HoS on the blockchain. A Hamming dis- 
tance threshold value is set to determine if the irises being compared belong to the same 
eye or a different eye. The threshold is based on the desired acceptance profile of the irises. 
A higher threshold decreases the need to retake an iris scan in the case that the image is 
of poor quality but increases the false positive rate. A lower threshold minimises the false 
positive rate but increases the need to retake an iris scan. The exact threshold used will be 
discussed in Chapter 5. A new approach for iris hash is also evaluated. Instead of hashing 
just one iris, both of the user’s irises are hashed instead and then combined into one hash 


by concatenation. This should increase the accuracy of the system as even if one iris scan is 
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poor quality, the other one should make up for the accuracy lost. The downside of this is 
that there is an increased computational cost in the rotations, and both of the user’s irises 
must be scanned. The comparison code is written in Python, and the NumPy library is used 


extensively to gain performance advantages [14]. The code is provided in the appendix. 


4.2. Blockchain 


An Ethereum blockchain is used. The Bitcoin blockchain is limited in its high transaction 
costs, high transaction time and smart contracts are primitive and limited in functionality. 
The high block time of the Bitcoin blockchain is a downside, as it can take over an hour for a 
transaction to be confirmed. In contrast, the block time on Ethereum is approximately 13.21 
seconds as of April 2022 [48]. Ethereum, by default, uses PoW for the consensus mechanism. 
If a private instance of a blockchain is used specifically for this system, PoW suffers in terms 
of security as the amount of validating nodes will be limited to solely healthcare institutions. 
The Clique Proof of Authority (PoA) consensus protocol (EIP-225) would be a better choice 
in this case [44]. Each healthcare institution has the same voting power regardless of com- 
putational power. This would make a 51% attack far less likely. The downside to a private 
instance of a blockchain is the reliance on healthcare institutions to connect to the blockchain 
and act as signers. For an attack to occur on the blockchain, more than half of the signers 


would have to act maliciously which is very unlikely given there are enough signers. 


The approach chosen for this system is to use the Ethereum mainnet. Firstly, it is the most se- 
cure option due to the large number of miners on the network. Secondly, it does not require 
institutions to become signers and validate blocks on the private blockchain. This would be 
a deterrent to many institutions in using the system. Lastly, Ethereum is set to move to a 
PoS consensus mechanism imminently [13]. This nullifies the downsides to PoW. The first 
downside is that mining pools aggregate miners’ power together. This allows pool opera- 
tors to have control over large amounts of power and act maliciously. The next downside is 


a large amount of energy is used in securing the network. PoS alleviates both issues. PoS 
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on Ethereum involves staking Ether and becoming a validator. Validators are rewarded for 
good behaviour and penalised for malicious actions. Rewards and penalties exist in the form 
of earning Ether or losing some or all of a validator’s staked Ether. A 51% attack on a PoS 
Ethereum requires control of over 50% of the total Ether supply on the network. Attacking 
the network with control of over 50% of the total supply of Ether is not in the attacker’s in- 
terest. As they have such a large share of the network, they are set to lose their investment 
as Ether devalues due to other market participants losing interest in interacting with the 


attacked blockchain. 


4.2.1 Smart Contract 


A smart contract written in Solidity is used for storing the hashed iris scans and the privacy- 
preserving identifiers. The HoSs are stored in a dynamic array of structs. A struct in Solidity 
has the same idea as structs in C, whereby multiple fields are organised into a single variable. 
Each HoS is represented by a struct because the maximum integer size in Solidity is 256 bits. 
A ‘left’ and ‘right’ variable in the struct represents the most significant 256 bits and least sig- 
nificant 256 bits of the HoS. When the irises are compared, the two variables are combined 
to form the 512-bit HoS. Each time a new HoS is added, two 256-bit values are passed into 
the ‘registerHashOfScan’ function. All the HoSs can be retrieved by calling ’getHashOfS- 
cans’. A mapping is used to check that each HoS registered is unique. A mapping in Solidity 
is similar in function to a hash map and consists of a key-value pairing system. The left 256 
bits of the iris scan map to the right 256 bits. When a new HoS is added, the mapping data 
structure is looked up with the left 256 bits and comparing the new right 256 bits with the 
value held by the mapping. This keeps the gas costs low as iterating over many values in an 


array to check for a value is computationally expensive. 


A mapping from the 256-bit PublicI D to a dynamic array of 256-bit HoRs is used to store 
the record identifiers. The ’addTransaction’ function is called to add a new record identifier. 
The ‘getTransactions’ function is called to retrieve all the record identifiers associated with 


a PublicID. A mapping is also used for storing the anonymous identifier pointing to the 
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encrypted record location. The ‘storeRecordLocation’ function is called by passing in the 
256-bit identifier and the string record location. A string is used for flexibility purposes so 
that any cloud storage provider can be used. The Keccak-256 hash function in Solidity is 
used to check that the location in the mapping is empty before use. This is because Solidity 
doesn’t support string comparison. Therefore, the strings must be hashed before being com- 
pared. This prevents an attacker from overwriting the storage location of another patient’s 
EHR. No functionality to delete identifiers is provided as it is always possible to retrospec- 
tively look at interactions with the blockchain and view the deleted identifiers. The form of 
revocation in the system is based on the off-chain storage mechanism. The smart contract 


code is included in the appendix. 


4.2.2 Development and Testing 


Hardhat was used for the smart contract development, testing and deployment [30]. Hard- 
hat is a development environment running on JavaScript that allows for rapid development 
on Ethereum. Hardhat allows for JavaScript code to be included in the Solidity code, which 
speeds up the debugging process. Smart contracts are compiled by Hardhat, and any syntax 


errors in the contract code are shown after compilation. 


Mocha was used for testing the smart contract [31]. The functionality of the smart contract 
was tested to ensure no issues or security vulnerabilities existed. The smart contract was de- 
ployed on a local instance of the Ethereum blockchain, which was run on Hardhat. The local 
machine is responsible for validating the blocks. The Web3.js JavaScript library was used to 
call the contract functions from a web app. The web app can be used for demonstrating the 
functionality of the system. A custom client for handling blockchain interactions should be 
used for real-world applications. A link to the WebApp for testing the project can be found 
in the appendix. It contains documentation for all operations and requires the MetaMask ex- 
tension. Three hundred ninety-five hashed irises from the CASIA-Iris-Interval dataset have 


already been registered. 


4.3. KEY DERIVATION FUNCTION PARAMETERS 44 


4.2.3 Anonymising Institution Behaviour 


As the blockchain is public, anyone can see the interactions with the contract. This includes 
the wallets registering the iris scans, the record identifiers and the encrypted record loca- 
tions. Some countermeasures are used to anonymise the actions of healthcare institutions. 
Using single-use blockchain accounts prevents linking transactions to a particular institu- 
tion. If one account is used for all transactions, it is possible to associate record locations 
with a particular institution. Another anonymising technique is to use a random waiting 
time between transactions. Instead of immediately creating a transaction, the client used to 
interact with the blockchain should put the patient’s transactions into a queue in random or- 
der. This makes associating or linking transactions together based on their vicinity to each 


other more difficult. 


4.3 Key Derivation Function Parameters 


One of the hash functions being used for the identifiers is a KDF. The KDF is used for creating 
the PrivateI D as the entropy of the parameters used in the identifier isn’t that high. Simply 
hashing PII isn’t sufficient, according to Marx et al. [23]. Password cracking tools such as 
HashCat can be leveraged to brute-force passwords derived from hashing [15]. Recall that 
the PrivateID is the hash of the concatenation of the HoS, a PIN and a social security 
number or equivalent S'S. A social security number in the U.S is comprised of 9 numbers 
in the form XXX-XX-XXXX. The entropy of these parameters and the identifier are shown in 
Table 4.1. The British National Insurance Number (NIN) and Irish Personal Public Service 
Number (PPSN) have similar entropy. Multiplication of the individual components of the 
identifier yields the entropy of the identifier. N is the number of registered HoSs on the 
blockchain. AWS provides a service that can compute 60 billion SHA-256 hashes in a second 
[19]. This would mean that in the worst case, all the PrivateI Ds could be brute-forced in 
approximately 5 hours if no KDF or salt is used. The PublicI D is used as the salt for the KDF 


to prevent usage of rainbow tables and precomputation. 
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Identifier Entropy 
HoS N 

PIN on 
SSN OPP 
PrivateI D 2020's Ni 


Table 4.1: PrivateI D Entropy 


Argon2 is the choice for the KDF used in the system. Argon2 has better protection against 
brute-force attacks on ASICs and GPUs than PBKDF2. It is also optimised for the x86 archi- 
tecture; therefore, normal machines found in healthcare institutions should have no problem 
running it. The Argon2i variation is used in the system as it is more suitable for password 
hashing and password-based key derivation. It is resistant to side-channel attacks, and it is 
slower as it makes more passes over the memory to protect against tradeoff attacks. Biryukov 
et al. made the following recommendation in 2016 for the speed of Argon2i: Key derivation 
for hard-drive encryption, which takes 3 seconds on a 2 GHz CPU using two cores — Argon2i 
with four lanes and 6 GiB of RAM.” [4]. As the average consumer-grade hardware has im- 
proved since then, it is reasonable to assume that 3 seconds on older hardware isn’t as secure 
today. For the system, we will take the Argon2 author’s recommendations and instead aim 
for 3 seconds on more modern hardware such as a 4GHz CPU using four cores with 8GiB 
of RAM. Testing the speed of Argon2i using various parameters on a modern machine is 


conducted in Chapter 5 to find suitable parameters for the system. 


4.4 File Storage and Sharing 


The patient’s EHRs are stored on IPNS, although the system can support the use of any 
cloud-storage provider. The choice of using a string for the record location was made to 
support any provider. The deletion of the EHR on IPNS is the revocation mechanism used. 
The patient should delete the stored EHR if they suspect an institution has acted maliciously 


or if the security of the system has weakened over time. An example of this is if the KDF 
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used has developed a vulnerability. In this case, the EHRs should be deleted and uploaded 
again using the updated KDF. An overview of how encryption is used in the system can be 


found in Figure 4.1. 


4.4.1 Personal and Storage and Temporary Access 


If the patient wants to store their EHR for their usage or to only give institutions temporary 
access, they encrypt it using their own S;,. The EHRs are encrypted using AES-256. Tempo- 
rary access is defined as showing the EHRs to a physician during a visit on a device but not 


providing the institution with any way to download the file. 


4.4.2 Granting Access to Institutions 


If the patient wishes to allow a healthcare institution to have access to their EHRs, they store 
a separate instance of the EHR on IPNS. The EHR is encrypted using RSA-2048. The insti- 
tution’s public key is used for encryption. If the patient wishes to revoke the institution’s 
access to EHR, they delete the EHR from IPNS. Institutions are responsible for deleting any 


locally stored instances of the EHR once the patient revokes their access to it. 
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Figure 4.1: Encryption overview 


CHAPTER 5 


Evaluation 


5.1 Iris Matching 


The CASIA-Iris-Interval dataset was evaluated in terms of iris extraction speed, hashing and 
matching accuracy. A new approach not evaluated by Braddish for increasing the iris accu- 
racy is also explored. This approach involves merging the hashes of both of the user’s irises 
into a single hash. The theory behind this approach is that it will reduce the false negatives 
as one iris scan can make up for deficits in the other. This process can be seen in Figure 5.1. 
The iris extraction code automatically discards irises where the pupil cannot be found inside 
the iris. This occurred in 60 of the 2639 irises in the dataset. Braddish found that four rota- 
tions were optimal for the CASIA-Iris-Interval dataset, and this was confirmed in testing [5]. 
A Hamming distance of 0.4 was set as the threshold for distinguishing between hashes of the 
same iris and a different iris. The angular resolution used was 200, and the radial resolution 


used was 28. 


Using the false rejection rate (FRR) and false acceptance rate (FAR) is a metric that can 
be used to measure the accuracy of the iris matching. The lower each value, the more ac- 
curate the comparisons are. Another metric that can be used is the equal error rate (EER). 


This is the value at which the FRR equals the FAR. It is independent of the threshold used 
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Figure 5.1: Comparison of single iris hashing and combined iris hashing 
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Figure 5.2: Comparison of the two metrics used in evaluating the iris comparison 
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to distinguish between the classes. Thus it is an easier metric is gauge in our case. 


From the results shown in Table 5.1, it was determined that using a singular iris-to-iris com- 
parison with four rotations was sufficient. Combining the irises increased the matching ac- 
curacy; however, it also significantly increased the time it took to compare the irises as much 
more rotations were required. A time of 17.2s to find a matching iris with one million reg- 
istered patients was deemed reasonable. Different countries and states could use separate 
registration systems to reduce the number of patients registered on a single blockchain. Us- 
ing four rotations with iris-to-iris comparison requires nine total comparisons; the original 
scan and four rotations in both the left and right directions. In contrast, combined iris com- 
parison requires eighty-one total comparisons as both irises have to be rotated nine times 
each, and there are eighty-one total permutations from two sets of nine hashes. Further ef- 
forts to speed up the comparison process could be explored, such as rewriting the code in 
C++. Additionally, both of the patient’s irises must be scanned. This increases the time it 
takes to operate the system. In the case of not finding a match, the patient’s iris is scanned 


again, and the process is repeated. 


Match Time/ 1M 
Class FAR FRR EER Rotations 
Found Scans (s) 
Iris-Io-Iris | 0.00063 0.21212 0.0618 0.788 1 6.4 
Iris-Io-Iris | 0.00178 0.13451 0.0431 0.865 4 17.2 
Combined 
0.00052 0.17526 0.0361 0.825 1 17.9 
Irises 
Combined 
0.00373 0.05683 0.0202 0.943 4 154 
Irises 


Table 5.1: Iris matching accuracy and time with 0.4 decision threshold 
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Figure 5.3: lris-to-iris comparison with 4 rotations 
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Figure 5.4: Combined iris comparison with four rotations 


5.2 Argon2i Parameters 


Argon2i was tested using the official implementation provided on the Password Hashing 
Competitions’ Github [35]. The KDF is implemented in C. Several bindings also exist for 


other programming languages, which makes it more accessible to a wider population. The 


5.2. ARGON2I PARAMETERS 


goal was to find parameters that give a hashing time of approximately 3 seconds on a modern 
machine. A Ryzen 5 3600 @4.2GHz with 16GB of DDR4 3200MHz memory was used in 
testing. The output hash length was 32 bytes, and the salt length was 32 bytes. The following 


results were obtained. 


Iterations | Memory Threads Time (s) 
1 512MiB 1 0.53 
1 512MiB 2 0.58 
1 512MiB 4 0.66 
1 512MiB 8 0.83 
1 1024MiB 1 1.05 
1 1024MiB 2 1.11 
1 1024MiB 4 1.39 
1 1024MiB 8 1.81 
1 2048MiB 1 2.26 
1 2048MiB 2 2.31 
1 2048MiB 4 2.69 
1 2048MiB 8 3.84 
2 512MiB 4 1.00 
2 512MiB 8 1.42 
2 1024MiB 4 2.06 
2 1024MiB 8 2.90 
2 2048MiB 1 3.27 
3 1024MiB Z Daas 
3 1024MiB 4 2.81 
3 1024MiB 8 4.27 
4 1024MiB 4 3.61 


Table 5.2: Time taken to run Argon2i using various parameters 


There is a fine balance between iteration count, memory and threads used in determining 
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the speed of the hashing. Increasing any of these parameters increases the computational 
cost. Increases in the iteration count result in a relatively linear increase in time. Increases 
in memory result in linear increases if sufficient memory exists on the machine. If there 
is insufficient memory, the cost is increased exponentially. Thread count allows for more 
parallelism to be utilised. 1024MiB of memory was chosen on the basis that some outdated 
machines may still be running with 4GiB of memory or less. Four threads were chosen as this 
is the average core count of computers according to the latest Steam hardware survey [45]. 
While core count and thread count can differ, the core count is usually the thread count 
or half of the thread count. Three iterations gave a result of 2.81 seconds which was close 
to the target of 3 seconds. Therefore the parameters chosen are three iterations, 1024MiB 
of memory and four threads. In the worst case, an attacker must try 2*° combinations on 
average to brute-force a Private D. This would take 53.5 million years given a 3 second 
time per hash. This was considered secure for this use case. The added PublicI D salt also 
ensures that each PrivateID must be computed separately, and rainbow tables cannot be 


used. 


5.3. Blockchain Transaction Costs 


As the Ethereum mainnet was chosen as the blockchain, it was important to evaluate the cost 
of registering iris scans and records. The gas used is a fixed value based on the complexity 
of the function and size of the data sent to the contract. The cost range is the range from the 
lowest price on the cheapest day to the lowest price on the most expensive day over the last 


week. Only writing data costs gas; therefore, retrieving records is free. 


Function Gas Used Cost Range (ETH) Cost Range ($) 
registerHashOfScan | 95273 0.00124-0.00400 3.81-12.30 

add Transaction 68904 0.00090-0.00289 2.75-8.89 
storeRecordLocation | 94176 0.00122-0.00396 3.76-12.15 


Table 5.3: Gas costs for the contract functions 
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The transactions fees were high as expected as the Ethereum mainnet was used. Using a 
Layer 2 Ethereum blockchains such as Optimism or Arbitrum to reduce the transaction fees 
is a possible solution however it may sacrifice security for cheaper fees. One way to minimise 
the transaction fees would be to only interact with the blockchain when the gas prices are 


low. This usually occurs in the middle of the night and at the weekend. 


5.4 IPNS Speed 


IPNS was evaluated for download and upload time using various files sizes to verify that it 
was a suitable EHR storage mechanism. After testing, it was determined that upload speed 
to IPFS was directly related to the internet connection used and not related to IPFS. After the 
file is uploaded to IPFS, the file’s CID is published to IPNS. Requesting a file directly after 
uploading it failed to download the file in all cases. A short amount of time (1-5 minutes) 
is required before the file can correctly be resolved. This time is dependent on the file’s size. 


This waiting period was also found to occur if a file wasn’t accessed for some time. 


Size (MB) | Download Time (s) 
1 0.8 

22 24 

102 119 

380 510 


Table 5.4: IPNS speed evaluation 


The download times were found to be acceptable for smaller file sizes such as IMB and 
22MB. File sizes for medical data depend on the type of file. A text document detailing 
a patient’s medical history could have a size of less than a megabyte, while a computed 
tomography (CT) scan could have a size of 20-30 megabytes. It was determined that IPNS 
was sufficient in terms of speed for the proposed use case, but improvements could be made 
in this component of the system. Further testing on the availability of files on IPNS after a 


long time must also be examined. It was not possible to evaluate this for this project. 


CHAPTER 6 


Conclusion 


An EHR storage framework was designed and evaluated in this project. An iris extraction 
algorithm was used to derive information from an iris scan. A LSH algorithm called S3Hash 
was used to hash the iris scans to hide the user’s raw iris data. A set of privacy-preserving 
identifiers were designed and evaluated for their security. A smart contract with the relevant 
functionality was used to store the identifiers and hashed iris scans. IPNS, a decentralised 


mutable storage system, was used to store the EHRs. 


6.1 Shortcomings 


In the evaluation section, a number of potential issues with the framework were discovered. 
Firstly, IPNS was slower than expected. It was also not possible to evaluate how persis- 
tent the files were. Secondly, using the Ethereum mainnet has costs concerns. Registering 
iris scans and storing identifiers are costly when considering a large population. Using a 
custom private Ethereum blockchain could be an alternative, but security in the form of de- 


centralisation may be sacrificed. 
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6.2 Future Work 


Further work should be done on exploring decentralised mutable storage systems. One pro- 
tocol of interest is DNSLink which can still utilise the IPFS network while reportedly hav- 
ing faster speeds [16]. The download speeds achieved in testing IPNS were not up to the 
standard of typical cloud storage providers. There should not be any significant penalty in 


efficiency when using a blockchain-based EHR storage system. 


Implementing the iris comparison code in a low-level programming language such as C++ 
could yield performance increases. The code could also be parallelised to further speed up 
the comparison. Exploring other faster programming languages for extracting the iris tem- 
plates is another area to look into. This is of a lower priority than the comparison, however, 
as iris extraction on a single iris scan at a time is still very fast. Efficiency issues only arise 


when extracting large amounts of them. 


Testing the framework in the real world is the ultimate test of the framework. This could 
take place within a healthcare institution or a smaller setting to validate the operation, secu- 
rity and efficiency is the ultimate test for the framework. The modularity of the framework 
allows for components to be interchanged and modified as seen fit by the institution, and 


thus testing could also occur in other settings. 
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Iris Comparison Code 


APPENDIX 


from email.mime import base 

import math 

import os 

from cv2 import sqrt 

from numpy.1ib.function_base import percentile 
import random 

import matplotlib.pyplot as plt 

import json 

import numpy as np 

from matplotlib.ticker import PercentFormatter 
import time 


from random import randrange 


COL = 28 

ROW = 400 

#file resolution codename 
RES = "20028" 

BITS = 512 

THRESHOLD = 0.4 

MAX FA = 1000000*205*3*36500 


vectorsFile = f"vectors{RES}.npy" 

dataFilePath = f"result{RES}" 

hashesFile = f"hashes{RES}.npy" 

hashesPairFile = f"hashespair{RES}.npy" 
hashesToBeStored = f"hashesbc{RES}.npy" 

curr dir = os.path.dirname(os.path.realpath(_ file_)) 


baselineIrises = [] 

baselineAttributes = [] 

irisCodeDataset = [] ## USED FOR HASHING (STRING FORMAT) 
attributes = [] 
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vectors = [] 


class NumpyEncoder (json. JSONEncoder) : 
def default(self, obj): 
if isinstance (obj, np.ndarray): 
return obj.tolist() 
return json.JSONEncoder.default(self, obj) 


def padAndSp1lit512Hex (str): 
res = str[2:] 
while(len(res) != BITS/4): 
res = "0" + res 


return res 


def globalMask(): 
globalMask = np. zeros (ROW*COL) 
for subject in os.listdir(curr_dir+f£"/{dataFilePath}/Masks") : 
£ = open(curr_dir+£"/{dataFilePath}/Masks/" + subject) 
data = json.load (f) 
for attribute in data: 
local_mask = np.array (data[attribute] ) 
globalMask = np.add(globalMask, local_mask) 
globalMask = np.divide(globalMask, (ROW*COL) ) 
globalMask[globalMask >= 0.06] = False 
globalMask[globalMask < 0.06] = True 
return globalMask 


def rotateLeft(list, columnLen) : 
return np.roll(list, -columnLen) 


def rotateRight (list, columnLen) : 
return np.roll(list, columnLen) 


def hamming distance (hashl, hash2) : 
return np.count_nonzero (hash1!=hash2) /BITS 


def decidability index(mean_1, mean_2, std_1, std 2): 
return (abs (mean_1l - mean_2) / sqrt((std_1**2 + std_2**2) / 2) [0]) [0] 


def SimHash(vector, len, left=True) : 
result = [] 
midpoint = int (BITS/2)-1 
if (left) : 
midpoint = 0 


for i in range(len): 


matrixMul = np.dot(vector, vectors [i+midpoint] ) 
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if (matrixMul >= 0): 
result.append (1) 

else: 
result.append (0) 
return np.array (result) 


def SimHashString(vector, len, left=True) : 
result = "" 
midpoint = int (BITS/2) -1 
if (left) : 
midpoint = 0 


for i in range(len): 
matrixMul = np.dot(vector, vectors [i+midpoint] ) 
if (matrixMul >= 0): 
result += "1" 
else: 
result += "0" 
return padAndSp1lit512Hex (hex (int (result, 2))) 


def getRandomVectors () : 
allData = [] 
for subject in os.listdir(curr_dir+f£"/{dataFilePath} /MaskedTemplates") [0:200]: 
£ = open (curr _dir+f£"/{dataFilePath}/MaskedTemplates/" + subject) 
data = json.load (f) 
for attribute in data: 
template = np.array (data/attribute] ) 
allData. append (template) 


randomVectors = [] 
while True: 
print ("Getting the ", len(randomVectors), " vector") 
randomness = 0 
vectorr = np.random.choice([-1, 0, 1], size=COL*ROW) 
for x in allData: 
if(np.dot(x, vectorr) >= 0): 
randomness = randomness + 1 


percentage randomness = randomness/1]en(allData) 
if (percentage randomness >= 0.4 and percentage randomness <= 0.6): 
randomVectors.append (vectorr) 
if (len (randomVectors) == BITS): 
break 


np_randomVectors = np.array (randomVectors) 
np.save(vectorsFile, np_randomVectors, allow_pickle=True) 


def hashAllScans (rotations) : 
print ("Hashing the scans...") 
hashes = [] 
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for subject in os.listdir(curr_dir+f"/{dataFilePath} /MaskedTemplates") : 
£ = open(curr_dir+f£"/{dataFilePath}/MaskedTemplates/" + subject) 
data = json.load (f) 
for attribute in data: 
local_hashes = [] 
template = np.array (data/attribute] ) 
#template = np.logical_and(template, gm)*1 
local_hashes.append (SimHash (template, BITS) ) 
for i in range(l, rotations+1): 
templater rotated = rotateRight (template, COL*i*2) 
templatel_ rotated = rotateLeft(template, COL*i*2) 
local_hashes . append ( 
SimHash(templater rotated, BITS) ) 
local_hashes . append ( 
SimHash(templatel_ rotated, BITS) ) 
hashes.append({"eye": attribute[-6:], "hashes": local_hashes} ) 
np.save (hashesFile, hashes, allow _pickle=True) 


def hashAllPairScans (rotations=0) : 
hashes = [] 
for subject in os.listdir(curr_dir+f£"/{dataFilePath}/MaskedTemplates") : 
£ = open (curr_dir+f£"/{dataFilePath}/MaskedTemplates/" + subject) 
data = json.load (f) 
##separate the pairs 
left = [] 
right = [] 
person = "" 
index = 0 
for attribute in data: 
person = attribute[-6:-3] 
target_eye = attribute[-3] #1 or R 
if(target_eye == "L"): 
left.append (data [attribute] ) 
else: 
right.append (data[attribute] ) 


print (person) 
for le in range(0, len(left)): 
for re in range(0, len(right)): 
local_hashes = [] 
index += 1 
template _1 = np.array (left[le]) 
template _r = np.array (right[re] ) 
#local_hashes.append(np. concatenate ([SimHash(template_|, int(BITS/2)), SimHash( 
template_r, int(BITS/2), False)])) 
for i in range(-rotations, rotations+1): 
for j in range(-rotations, rotations+1): 
templatel_rotated = rotateRight (template 1, COL*i*2) 
templater rotated = rotateRight (template _r, COL*j*2) 
local_hashes. append (np.concatenate([SimHash(templatel_ rotated, int (BITS/2)), 
SimHash (templater_rotated, int (BITS/2), False) ]) ) 
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1_id £"00{le}" 
r_id = £"00{re}" 
if le>9: 

lid = £"0{le}" 
if re>9: 

rid = £"0{re}" 


hashes.append({"person": £"{person}-L{l_ id}-R{r_id}", 


np.save (hashesPairFile, hashes, allow _pickle=-True) 


def compareIrisHashes (target, same _eye, rotations=0) : 
baselineIrises = [] 
for d in hashes: 
hash = d["hashes"] 
#print(len(hash) ) 
subject = d["eye"] 
if target in subject: 
baselineIrises = hash 
baselineAttributes.append (subject) 
else: 
irisCodeDataset. append (hash [0] 
attributes. append (subject) 


crosshashing = [] 

total_accepted = 0 

same_eye accepted = 0 

total_same_ eye = 0 

best_match_str = "" 

best_match = BITS 

for yl, al in zip(irisCodeDataset, attributes) : 
best = BITS 
best_dist = 0 


for y2 in baselineIrises: 
diff = hamming distance(yl, y2) 
if (diff < best): 
best = diff 
best_dist = diff 
if(diff < best_match): 
best_match = diff 
best_match_str = al 
same = best_dist <= THRESHOLD 
if (same_eye in al): 
if (same) : 


same_eye accepted = same eye accepted + 1 


total_same_eye = total_same_ eye + 1 
else; 
if (same) : 
total_accepted = total_accepted + 1 


"hashes": local_hashes} ) 
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crosshashing.append (best dist) 
for x, y in zip(crosshashing, attributes): 
print(y +": " + str(x)) 


FRR (total_same_eye-same_eye accepted) / total_same_eye 
FAR = total_accepted / int (len(irisCodeDataset) ) 


print ("FRR: " + str (FRR) ) 
print ("FAR: " + str (FAR) ) 
print ("Best match: " + best_match_str[-6:] + " : " + str(best_match) ) 


return (FAR, FRR) 


def compareHashes (targetl, target2) : 
hashl = "™" 
hash2 = "" 
for subject in os.listdir(curr_dir+f£"/{dataFilePath}/MaskedTemplates") : 
£ = open(curr_dir+£"/{dataFilePath}/MaskedTemplates/" + subject) 
data = json.load (f) 


for attribute in data: 

if targetl in attribute: 
template = data[attribute] 
hashl = SimHash(template, BITS) 

elif target2 in attribute: 
template = data[attribute] 
hash2 = SimHash(template, BITS) 

print (hamming distance (hashl, hash2) ) 


def getHashOfIrisScan (target, rotations=0) : 
for subject in os.listdir(curr_dir+f"/{dataFilePath}/maskedTemplates") : 
£ = open(curr_dir+£"/{dataFilePath}/maskedTemplates/" + subject) 
data = json.load (f) 


for attribute in data: 
if target in attribute: 
local_hashes = [] 
template = ( 
(np.reshape (np.array (data[attribute]), (COL, ROW))).transpose()) .flatten() 
local_hashes.append (SimHash (template, BITS) ) 


for i in range(1l, rotations+1): 
local_hashes . append ( 
SimHash (rotateRight (template, COL*i*2), BITS) ) 
local_hashes . append ( 
SimHash (rotateLeft (template, COL*i*2), BITS) ) 


return np.array (local_hashes) 
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def printHashOfIrisScan (target) : 


for subject in os.listdir(curr_dir+f"/{dataFilePath}/maskedTemplates") : 


£ = open(curr_dir+£"/{dataFilePath}/maskedTemplates/" + subject) 
data = json.load (f) 


for attribute in data: 
if target in attribute: 
template = np.array (data[attribute] ) 
splitHash = SimHashString(template, BITS) 
print (splitHash) 
return splitHash; 


def compareAllIrisHashes (skip=1, rotations=0): 
start = time.process_time() 
total_scans = 0 
same_eye accepted = 0 
total_same_ eye = 0 
total_accepted = 0 
different eye comparisons = 0 
mean_1 = [] 
mean 2 = [] 
#print(len(hashes)) 
total_comparisons = 0 


for i in range(0, len(hashes), skip): 
total_scans = total_scans + 1 
target hashes = hashes[i] ["hashes"][0: ((1+(rotations*2) )) ] 
target_eye = hashes[i] ["eye"] [:4] 
target _subject = hashes [i] ["eye"] 


#print ("working on , target_subject) 
for comparison_hash in hashes[i:]: 
7 get the non—rotated hash 
chh = comparison_hash/["hashes"] [0] 
che = comparison_hash["eye"] 
#don't compare the same eye scan to itself 
if(target_subject in che): 
continue 
best_hd = 1 
for t_h_i in target_hashes: 
hd = hamming_distance(t_h_i, chh) 
if (hd < best_hd): 
best_hd = hd 
same = best_hd <= THRESHOLD 
total_comparisons = total_comparisons + 1 
jf we are comparing the same eyes 
if(target_eye in che): 
mean_1.append (best_hd) 
if (same) : 
same_eye accepted = same _eye accepted + 1 
total_same_ eye = total_same_eye + 1 
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def 


7 comparing different eyes 
else; 
mean_2.append (best_hd) 
if (same) : 
total_accepted = total_accepted + 1 
different eye comparisons = different_eye comparisons + 1 
print ("FRR: ", str((total_same_eye-same_eye accepted) / total_same_eye), "\nFAR: ", str( 
total_accepted/different eye comparisons) ) 
mean_1 = np.array(mean_1) 
mean _2 = np.array(mean_2) 
pit.hist(mean_1, label="Same class", weights=np.ones (len (mean_1) ) if len(mean_1), bins=50, fc=(0, 
O72 775 06/5)2) 
plt.hist(mean_2, label="Different class", weights=np.ones(len(mean_2)) / len(mean_2), bins=50, 
fc=(1, 0, 0, 0.5)) 
plt.axvline (x=THRESHOLD, ymin=0.01, ymax=0.99, color='r', label="Threshold", linewidth=3) 
#plt.yscale("log”) 
plt.title ("Eye to Eye Comparison") 
plt.legend () 
plt.xlabel ("Hamming distance") 
plt.ylabel ("Frequency") 
std_1 = np.std(mean_1) 
std_2 = np.std(mean_2) 
mean_1 = np.mean(mean_1) 
mean_2 = np.mean(mean_2) 
plit.gca() .yaxis.set_major formatter (PercentFormatter (1) 


#plt.spines[’ left ’].set_color(’white ') # setting up Y—axis tick color to red 
#plt.spines[’top '].set_color(’ white’) #setting up above X-axis tick color to red 
print ("Decidability index: ", decidability index(mean_1, mean_2, std_1, std_2)) 

print ("Total Comparisons : ", total_comparisons) 

print("time taken : ", time.process time() - start) 

plt. show () 


compareAllIrisHashesPerson (skip=1, rotations=0) : 
start = time.process_time() 
total_scans = 0 

same_eye accepted = 0 
total_same_ eye = 0 
total_accepted = 0 
different eye comparisons = 0 
mean_1 = [] 

mean 2 = [] 
#print(len(hashes)) 
total_comparisons = 0 

k = (1+(rotations*2) ) **2 

rr = (1+(rotations*2) ) **2 


print (len (hashes) ) 


for i in range(0, len(hashes), skip): 
total_scans = total_scans + 1 
target hashes = hashes[i] ["hashes"][0: k] 
target person = hashes [i] ["person"] [0:3] 


APPENDIX A. IRIS COMPARISON CODE 


target_id = hashes [i] ["person"] 
target_id[4:8] 
target_id[9:] 


target_left_eye = 
target _right_eye = 


#print ("working on , target_subject) 


for comparison_hash in hashes[i:]: 


391 ## get the non—rotated hash 

392 chh = comparison_hash["hashes"] [math.floor(rr/2) ] 
393 chp = comparison_hash/["person"] 

394 chpp = comparison_hash["person"] [0:3] 

395 #fleft eye id 

396 chpl = comparison_hash["person"] [4:8] 

397 fright eye id 

398 chpr = comparison_hash["person"] [9:] 


#tdon't compare the same eye scan to itself 
if (target_id == chp or 


target_right_eye == chpr))): 


401 continue 

402 best_hd = 1 

403 for t_h_i in target_hashes: 

404 hd = hamming_distance(t_h_i, chh) 
405 if(hd < best_hd): 

406 best_hd = hd 

407 same = best_hd <= THRESHOLD 

408 total_comparisons += 1 


409 jf we are comparing the same eyes 
if(target_person == chpp): 
mean_1.append (best_hd) 
if (same) : 
same_eye accepted = same_eye accepted + 1 
total_same eye = total_same_eye + 1 
7 comparing different eyes 
else: 
mean_2.append (best_hd) 


if (same) : 
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total_accepted = total_accepted + 1 
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5 


different _eye comparisons = 
print ("FRR: ", 
total_accepted/different eye comparisons) ) 
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str ((total_same_eye-same_eye accepted) 


422 mean_1 = np.array(mean_1) 


423 mean _2 = np.array(mean_2) 
pit.hist(mean_1, label="Same class", weights=np.ones (len (mean_1) ) 


Dips deg? 0405) *) 


fc=(1, 0, 0, 0.5)) 
plt.axvline (x=THRESHOLD, ymin=0.01, 
#plt.yscale("log”) 
plt.title ("Eyes Combined Comparison") 
plt.legend () 
plt.xlabel ("Hamming distance") 


426 ymax=0.99, color='r', 
427 
428 
429 


430 


plt.ylabel ("Frequency") 


/ total_same_eye), 


plt.hist(mean_2, label="Different class", weights=np.ones (len (mean_2) ) 


label="Threshold", 
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(chpp == target_person and (target_left_eye == chpl or 


different eye comparisons + 1 


"\nFAR: ", str ( 
/ len(mean_1), bins=50, fe=(0, 
/ len(mean_2), bins=50, 


linewidth=3) 
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std_1 
std_2 


mean_1 = np.mean(mean_1) 


np.std(mean_1) 


np.std(mean_ 2) 


mean_2 = np.mean(mean_2) 
plit.gca() .yaxis.set_major formatter (PercentFormatter (1) 


print ("Decidability index: ", decidability index(mean_1, mean_2, std_1, std_2)) 
print ("Total Comparisons : ", total_comparisons) 

print("time taken : ", time.process time() - start) 

plt. show () 


def getScansToBC (): 
hashes = [] 


for subject in os.listdir(curr_dir+f£"/{dataFilePath}/MaskedTemplates") : 


£ = open (curr _dir+f£"/{dataFilePath}/MaskedTemplates/" + subject) 
data = json.load(f) 
##separate the pairs 
left = False 
right = False 
for attribute in data: 
target_eye = attribute/[-3] 
if(target_eye == "L" and not left): 
left = True 
sh = SimHashString(np.array (data[attribute]), BITS) 
hashes.append (sh[0] ) 
hashes .append(sh[1] ) 
elif (target_eye == "R" and not right): 
right = True 
sh = SimHashString(np.array (data[attribute]), BITS) 
hashes.append(sh[0] ) 
hashes.append(sh[1] ) 
np.save (hashesToBeStored, hashes, allow_pickle=True) 


def BCSearchSimulation (rotations=0) : 
hashes = [] 
test_hashes = [] 


for subject in os.listdir(curr_dir+f"/{dataFilePath}/MaskedTemplates") : 


£ = open (curr _dir+f£"/{dataFilePath}/MaskedTemplates/" + subject) 
data = json.load (f) 
##separate the pairs 
left = False 
right = False 
twofa_1l = randrange (MAX FA) 
twofa_r = randrange (MAX FA) 
for attribute in data: 

temp hash = [] 

target_eye = attribute/[-3] 

template = np.array (data/attribute] ) 

sh = SimHash (template, BITS) 

if(target_eye == "L" and left): 

for i in range(1l, rotations+1): 
templater rotated = rotateRight (template, COL*i*2) 
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templatel_rotated = rotateLeft(template, COL*i*2) 
temp_hash.append (SimHash (templater rotated, BITS) ) 
temp_hash.append (SimHash(templatel rotated, BITS) ) 
test_hashes.append({"hash": temp_hash, "tfa": twofa_1}) 
elif (target_eye == "L" and not left): 
left = True 
temp_hash. append (sh) 
hashes.append({"hash": temp hash, "tfa": twofa_1}) 
elif (target_eye == "R" and right): 
for i in range(l, rotations+1): 
templater rotated = rotateRight (template, COL*i*2) 
templatel_rotated = rotateLeft(template, COL*i*2) 
temp_hash. append ( 
SimHash (templater rotated, BITS) ) 
temp_hash. append ( 
SimHash (templatel_ rotated, BITS) ) 
test_hashes.append({"hash": temp_hash, "tfa": twofa_r}) 
elif (target_eye == "R" and not right): 
right = True 
temp_hash. append (sh) 
hashes.append({"hash": temp hash, "tfa": twofa_r}) 


hashes = np.array (hashes) 
test_hashes = np.array (test_hashes) 
print (len(hashes), len(test_hashes) ) 
start = time.process_ time () 
correct_match = 0 
for data in test_hashes: 
for be_hash in hashes: 
best _dist = 1 
for rotation in data["hash"]: 
dist = hamming distance (rotation, bc_hash["hash"]) 
if(dist < best_dist): 
best_dist = dist 
if (best_dist <= THRESHOLD) : 
if (bc_hash["tfa"] == data["tfa"]): 
correct match += 1 
print (correct_match / len(test_hashes) ) 
print ("time taken : ", time.process time() - start) 
##np.save(hashesToBeStored, hashes, allow_pickle=True) 


#tthe following line can be commented out after first run 

getRandomVectors () 

vectors = np.load(vectorsFile) #load the stored vectors 

#tthe following line can be commented out after first run 

hashAllScans (4) 

#Uncomment the next line to hash the iris scans with the Left and Right eye combined 
#hashAllPairScans(1) 

hashes = np.load(hashesFile, allow _pickle=True) #load the stored hashes 

#Uncomment the next line to load the iris scans with the Left and Right eye combined 


#hashes = np.load(hashesPairFile, allow_pickle=True) 


534 
535 
536 
537 
538 
539 
540 
541 
542 
543 
544 
545 


546 
547 
548 


APPENDIX A. IRIS COMPARISON CODE 


74 


compareAlliIrisHashes (skip=1, rotations=4) 


#fUncomment the next line to compare the iris scans with the Left and Right eye combined 


#compareAlllrisHashesPerson(skip=10, rotations=1) 


waa 


Set 
1. 


oon 


7. 


waa 


up 

Download the iris template extractor https://github.com/bradishp/IrisTemplateExtractor 

Find a dataset such as the CASIA—iris—interval and run the extraction code on the dataset 

Store the results in the same path as this script 

Call the results folder "result{angular_res}{radial_res}” 

Change the "RES” constant to {angular_res}{radial_res} such as "20028" for 200 angular resolution 
and 28 radial resolution. 

Change any constants to the necessary values — ROW = 2xangular_res and COL = radial_res 


Run the program to hash and compare all scans in the dataset. 
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Smart Contract Code 


//SPDX-License-Identifier: Unlicense 
pragma solidity *0.8.0; 
import "@openzeppelin/contracts/access/Ownable.sol"; 
[** 
* @title PrivacyPreserving 
* @dev A privacy preserving smart contract 
*/ 


contract PermissionedPrivacy is Ownable { 


struct hashoOfScan { 
uint256 left; //MS 256 bits 
uint256 right; //LS 256 bits 


event RegisteredHashOfScan (uint256 left, uint256 right); 
event StoredRecordLocation (uint256 indexed _transactionID, string _record) ; 
event CreatedTransaction (uint256 indexed _publicID, uint256 _hashOfRecord) ; 


hashOfScan[] internal hashOfScans; 

//map the MS 256 bits of the Hash of scan to the least significant ones 
mapping (uint256 => uint256) internal isRegistered; 

//map publicID -> list of hashes of records 

mapping (uint256 => uint256[]) internal transactions; 

mapping (uint256 => string) internal transactionIDToRecord; 


[** 
* @dev Register the hash of a user's iris scan 
* @param left MS 256 bits of LSH of the scan 
* @param right LS 256 bits of LSH of the scan 
#7 
function registerHashOfScan(uint256 left, uint256 right) external { 


require (isRegistered[ left] != right, "That hash of scan is already registered. 
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hashOfScans.push (hashOfScan(_left, _right)); 
isRegistered[_ left] = _right; 
emit RegisteredHashOfScan(_left, right); 
} 
[** 
* @dev Register multiples hashes of a users iris scan - permissioned for the contract owner 
* @param _hashOfScans LSH of the scan, in 2 index step, each even index marks the MS 256 bits 
of a new scan and even+l marks LS 256 bits 


*/ 
function batchRegisterHashOfScan (uint256[] calldata _hashOfScans) external onlyOwner { 

//require (isRegistered[_hashOfScan] == false, "That hash of scan is already registered.") ; 

uint256 hashOfScansLength = _hashOfScans.length; 

require (hashOfScansLength % 2 == 0, "Each Hash Of Scan must contain the MS 256 and LS 256 
bits."); 

for(uint i = 0; i < hashOfScansLength; i+=2) { 
hashOfScans.push (hashOfScan(_hashOfScans[i], _hashOfScans[i+1])); 
isRegistered[_hashOfScans[i]] = _hashOfScans/[i+1]; 


[** 
* @dev Return hashOfScans 
* @return value of 'hashOfScans' 
*/ 
function getHashOfScans() external view returns (hashOfScan[] memory) { 
return hashOfScans; 
} 
[** 
* @dev Store a transaction to record the details of a record to store 
* @param _transactionID The transactionID to map the location of the record 
* @param _record The storage location of the encrypted record 
*/ 
function storeRecordLocation(uint256 _transactionID, string memory _record) external { 
require (keccak256 (bytes (transactionIDToRecord[_transactionID])) == keccak256(bytes("")), " 
That transactionID already exists."); 
transactionIDToRecord/[_transactionID] = _record; 
emit StoredRecordLocation(_transactionID, _ record) ; 
} 
[** 
* @dev Store a hash of a record in a mapping from the publicID to the hash of the records 
* @param _transactionID The transactionID mapping to the location of the encrypted record 
* @return value of the location of the record if it exists 
#7 
function retrieveRecordLocation (uint256 _transactionID) external view returns (string memory) 
string memory recordLocation = transactionIDToRecord/[_transactionID] ; 
require (keccak256 (bytes (recordLocation)) != keccak256(bytes("")), "The location of the 
record is empty."); 
return recordLocation; 
} 
[** 
* @dev Store a transaction to record the details of a record to store 
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* @param publicID The public ID 
* @param _hashOfRecord The hash of the record 
xy 
function addTransaction(uint256 publicID, uint256 hashOfRecord) external { 
transactions[_publicID] .push(_hashOfRecord) ; 
emit CreatedTransaction(_publicID, _hashOfRecord) ; 
} 
[** 
* @dev Return transactions of given _publicID 
* @param publicID The public ID of the transactions 
* @return value of 'transactions[_publicID]' 
“7 
function getTransactions (uint256 _publicID) external view returns (uint256[] memory) { 
return transactions[_publicID]; 
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Web App Link 


https://theprivacyprotocol.xyz/ 
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